Unverified Commit 16f0b7d7 authored by lewtun's avatar lewtun Committed by GitHub
Browse files

Update ONNX docs (#14904)



* Remove docs for deprecated ONNX export

* Tidy up the CLI help messages

* Revamp ONNX docs

* Update auto-config table

* Use DistilBERT as example for consistency

* Wrap up first pass at ONNX docs

* Fix table check

* Add tweaks and introduction

* Add cross-ref

* Fix missing import

* Fix style

* Add permalinks to ONNX configs

* Clarify role of OrderedDict

* Update docs/source/serialization.mdx
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add doctest syntax to code blocks

* Remove permalinks

* Revert "Remove permalinks"

This reverts commit 099701daf0db27823457867938efdb2d4f22a7c1.
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
parent 704d1fec
...@@ -10,27 +10,38 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o ...@@ -10,27 +10,38 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> -->
# Exporting transformers models # Exporting 🤗 Transformers Models
## ONNX / ONNXRuntime If you need to deploy 🤗 Transformers models in production environments, we
recommend exporting them to a serialized format that can be loaded and executed
on specialized runtimes and hardware. In this guide, we'll show you how to
export 🤗 Transformers models in two widely used formats: ONNX and TorchScript.
Projects [ONNX (Open Neural Network eXchange)](http://onnx.ai) and [ONNXRuntime (ORT)](https://microsoft.github.io/onnxruntime/) are part of an effort from leading industries in the AI field to provide a Once exported, a model can optimized for inference via techniques such as
unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety quantization and pruning. If you are interested in optimizing your models to run
of hardware and dedicated optimizations. with maximum efficiency, check out the [🤗 Optimum
library](https://github.com/huggingface/optimum).
## ONNX
Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to The [ONNX (Open Neural Network eXchange)](http://onnx.ai) project is an open
the ONNX format. You can have a look at the effort by looking at our joint blog post [Accelerate your NLP pipelines standard that defines a common set of operators and a common file format to
using Hugging Face Transformers and ONNX Runtime](https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333). represent deep learning models in a wide variety of frameworks, including
PyTorch and TensorFlow. When a model is exported to the ONNX format, these
operators are used to construct a computational graph (often called an
_intermediate representation_) which represents the flow of data through the
neural network.
By exposing a graph with standardized operators and data types, ONNX makes it
easy to switch between frameworks. For example, a model trained in PyTorch can
be exported to ONNX format and then imported in TensorFlow (and vice versa).
### Configuration-based approach 🤗 Transformers provides a `transformers.onnx` package that enables you to
convert model checkpoints to an ONNX graph by leveraging configuration objects.
These configuration objects come ready made for a number of model architectures,
and are designed to be easily extendable to other architectures.
Transformers v4.9.0 introduces a new package: `transformers.onnx`. This package allows converting checkpoints to an Ready-made configurations include the following architectures:
ONNX graph by leveraging configuration objects. These configuration objects come ready made for a number of model
architectures, and are made to be easily extendable to other architectures.
Ready-made configurations include the following models:
<!--This table is automatically generated by make style, do not fill manually!--> <!--This table is automatically generated by make style, do not fill manually!-->
...@@ -50,24 +61,30 @@ Ready-made configurations include the following models: ...@@ -50,24 +61,30 @@ Ready-made configurations include the following models:
- T5 - T5
- XLM-RoBERTa - XLM-RoBERTa
This conversion is handled with the PyTorch version of models - it, therefore, requires PyTorch to be installed. If you The ONNX conversion is supported for the PyTorch versions of the models. If you
would like to be able to convert from TensorFlow, please let us know by opening an issue. would like to be able to convert a TensorFlow model, please let us know by
opening an issue.
<Tip> In the next two sections, we'll show you how to:
The models showcased here are close to fully feature complete, but do lack some features that are currently in * Export a supported model using the `transformers.onnx` package.
development. Namely, the ability to handle the past key values for decoder models is currently in the works. * Export a custom model for an unsupported architecture.
</Tip> ### Exporting a model to ONNX
#### Converting an ONNX model using the `transformers.onnx` package To export a 🤗 Transformers model to ONNX, you'll first need to install some
extra dependencies:
The package may be used as a Python module: ```bash
pip install transformers[onnx]
```
The `transformers.onnx` package can then be used as a Python module:
```bash ```bash
python -m transformers.onnx --help python -m transformers.onnx --help
usage: Hugging Face ONNX Exporter tool [-h] -m MODEL -f {pytorch} [--features {default}] [--opset OPSET] [--atol ATOL] output usage: Hugging Face Transformers ONNX exporter [-h] -m MODEL [--feature {causal-lm, ...}] [--opset OPSET] [--atol ATOL] output
positional arguments: positional arguments:
output Path indicating where to store generated ONNX model. output Path indicating where to store generated ONNX model.
...@@ -75,231 +92,322 @@ positional arguments: ...@@ -75,231 +92,322 @@ positional arguments:
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
-m MODEL, --model MODEL -m MODEL, --model MODEL
Model's name of path on disk to load. Model ID on huggingface.co or path on disk to load model from.
--features {default} Export the model with some additional features. --feature {causal-lm, ...}
--opset OPSET ONNX opset version to export the model with (default 12). The type of features to export the model with.
--atol ATOL Absolute difference tolerance when validating the model. --opset OPSET ONNX opset version to export the model with.
--atol ATOL Absolute difference tolerence when validating the model.
``` ```
Exporting a checkpoint using a ready-made configuration can be done as follows: Exporting a checkpoint using a ready-made configuration can be done as follows:
```bash ```bash
python -m transformers.onnx --model=bert-base-cased onnx/bert-base-cased/ python -m transformers.onnx --model=distilbert-base-uncased onnx/
``` ```
This exports an ONNX graph of the mentioned checkpoint. Here it is *bert-base-cased*, but it can be any model from the which should show the following logs:
hub, or a local path.
It will be exported under `onnx/bert-base-cased`. You should see similar logs:
```bash ```bash
Validating ONNX model... Validating ONNX model...
-[✓] ONNX model outputs' name match reference model ({'pooler_output', 'last_hidden_state'} -[✓] ONNX model outputs' name match reference model ({'last_hidden_state'})
- Validating ONNX Model output "last_hidden_state": - Validating ONNX Model output "last_hidden_state":
-[] (2, 8, 768) matchs (2, 8, 768) -[] (2, 8, 768) matches (2, 8, 768)
-[] all values close (atol: 0.0001) -[] all values close (atol: 1e-05)
- Validating ONNX Model output "pooler_output": All good, model saved at: onnx/model.onnx
-[] (2, 768) matchs (2, 768)
-[] all values close (atol: 0.0001)
All good, model saved at: onnx/bert-base-cased/model.onnx
``` ```
This export can now be used in the ONNX inference runtime: This exports an ONNX graph of the checkpoint defined by the `--model` argument.
In this example it is `distilbert-base-uncased`, but it can be any model on the
Hugging Face Hub or one that's stored locally.
```python The resulting `model.onnx` file can then be run on one of the [many
import onnxruntime as ort accelerators](https://onnx.ai/supported-tools.html#deployModel) that support the
ONNX standard. For example, we can load and run the model with [ONNX
Runtime](https://onnxruntime.ai/) as follows:
from transformers import BertTokenizerFast ```python
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
```
tokenizer = BertTokenizerFast.from_pretrained("bert-base-cased") The required output names (i.e. `["last_hidden_state"]`) can be obtained by
taking a look at the ONNX configuration of each model. For example, for
DistilBERT we have:
ort_session = ort.InferenceSession("onnx/bert-base-cased/model.onnx") ```python
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
inputs = tokenizer("Using BERT in ONNX!", return_tensors="np") >>> config = DistilBertConfig()
outputs = ort_session.run(["last_hidden_state", "pooler_output"], dict(inputs)) >>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
``` ```
The outputs used (`["last_hidden_state", "pooler_output"]`) can be obtained by taking a look at the ONNX ### Selecting features for different model topologies
configuration of each model. For example, for BERT:
Each ready-made configuration comes with a set of _features_ that enable you to
export models for different types of topologies or tasks. As shown in the table
below, each feature is associated with a different auto class:
| Feature | Auto Class |
| ------------------------------------ | ------------------------------------ |
| `causal-lm`, `causal-lm-with-past` | `AutoModelForCausalLM` |
| `default`, `default-with-past` | `AutoModel` |
| `masked-lm` | `AutoModelForMaskedLM` |
| `question-answering` | `AutoModelForQuestionAnswering` |
| `seq2seq-lm`, `seq2seq-lm-with-past` | `AutoModelForSeq2SeqLM` |
| `sequence-classification` | `AutoModelForSequenceClassification` |
| `token-classification` | `AutoModelForTokenClassification` |
For each configuration, you can find the list of supported features via the
`FeaturesManager`. For example, for DistilBERT we have:
```python ```python
from transformers.models.bert import BertOnnxConfig, BertConfig >>> from transformers.onnx.features import FeaturesManager
config = BertConfig() >>> distilbert_features = list(FeaturesManager.get_supported_features_for_model_type("distilbert").keys())
onnx_config = BertOnnxConfig(config) >>> print(distilbert_features)
output_keys = list(onnx_config.outputs.keys()) ["default", "masked-lm", "causal-lm", "sequence-classification", "token-classification", "question-answering"]
``` ```
#### Implementing a custom configuration for an unsupported architecture You can then pass one of these features to the `--feature` argument in the
`transformers.onnx` package. For example, to export a text-classification model
we can pick a fine-tuned model from the Hub and run:
Let's take a look at the changes necessary to add a custom configuration for an unsupported architecture. Firstly, we ```bash
will need a custom ONNX configuration object that details the model inputs and outputs. The BERT ONNX configuration is python -m transformers.onnx --model=distilbert-base-uncased-finetuned-sst-2-english \
visible below: --feature=sequence-classification onnx/
```
```python which will display the following logs:
class BertOnnxConfig(OnnxConfig):
@property ```bash
def inputs(self) -> Mapping[str, Mapping[int, str]]: Validating ONNX model...
return OrderedDict( -[✓] ONNX model outputs' name match reference model ({'logits'})
[ - Validating ONNX Model output "logits":
("input_ids", {0: "batch", 1: "sequence"}), -[] (2, 2) matches (2, 2)
("attention_mask", {0: "batch", 1: "sequence"}), -[] all values close (atol: 1e-05)
("token_type_ids", {0: "batch", 1: "sequence"}), All good, model saved at: onnx/model.onnx
]
)
@property
def outputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict([("last_hidden_state", {0: "batch", 1: "sequence"}), ("pooler_output", {0: "batch"})])
``` ```
Let's understand what's happening here. This configuration has two properties: the inputs, and the outputs. Notice that in this case, the output names from the fine-tuned model are
`logits` instead of the `last_hidden_state` we saw with the
`distilbert-base-uncased` checkpoint earlier. This is expected since the
fine-tuned model has a sequence classification head.
<Tip>
The features that have a `with-past` suffix (e.g. `causal-lm-with-past`)
correspond to model topologies with precomputed hidden states (key and values
in the attention blocks) that can be used for fast autoregressive decoding.
</Tip>
The inputs return a dictionary, where each key corresponds to an expected input, and each value indicates the axis of
that input.
For BERT, there are three necessary inputs. These three inputs are of similar shape, which is made up of two ### Exporting a model for an unsupported architecture
dimensions: the batch is the first dimension, and the second is the sequence.
The outputs return a similar dictionary, where, once again, each key corresponds to an expected output, and each value If you wish to export a model whose architecture is not natively supported by
indicates the axis of that output. the library, there are three main steps to follow:
Once this is done, a single step remains: adding this configuration object to the initialisation of the model class, 1. Implement a custom ONNX configuration.
and to the general `transformers` initialisation. 2. Export the model to ONNX.
3. Validate the outputs of the PyTorch and exported models.
An important fact to notice is the use of *OrderedDict* in both inputs and outputs properties. This is a requirements In this section, we'll look at how DistilBERT was implemented to show what's
as inputs are matched against their relative position within the *PreTrainedModel.forward()* prototype and outputs are involved with each step.
match against there position in the returned *BaseModelOutputX* instance.
An example of such an addition is visible here, for the MBart model: [Making MBART ONNX-convertible](https://github.com/huggingface/transformers/pull/13049/commits/d097adcebd89a520f04352eb215a85916934204f) #### Implementing a custom ONNX configuration
If you would like to contribute your addition to the library, we recommend you implement tests. An example of such Let's start with the ONNX configuration object. We provide three abstract
tests is visible here: [Adding tests to the MBART ONNX conversion](https://github.com/huggingface/transformers/pull/13049/commits/5d642f65abf45ceeb72bd855ca7bfe2506a58e6a) classes that you should inherit from, depending on the type of model
architecture you wish to export:
### Graph conversion * Encoder-based models inherit from [`OnnxConfig`](https://github.com/huggingface/transformers/blob/c4fa908fa98c3d538462c537d29b7613dd71306e/src/transformers/onnx/config.py#L52)
* Decoder-based models inherit from [`OnnxConfigWithPast`](https://github.com/huggingface/transformers/blob/c4fa908fa98c3d538462c537d29b7613dd71306e/src/transformers/onnx/config.py#L264)
* Encoder-decoder models inherit from [`OnnxSeq2SeqConfigWithPast`](https://github.com/huggingface/transformers/blob/c4fa908fa98c3d538462c537d29b7613dd71306e/src/transformers/onnx/config.py#L399)
<Tip> <Tip>
The approach detailed here is bing deprecated. We recommend you follow the part above for an up to date approach. A good way to implement a custom ONNX configuration is to look at the existing
implementation in the `configuration_<model_name>.py` file of a similar architecture.
</Tip> </Tip>
Exporting a model is done through the script *convert_graph_to_onnx.py* at the root of the transformers sources. The Since DistilBERT is an encoder-based model, its configuration inherits from
following command shows how easy it is to export a BERT model from the library, simply run: `OnnxConfig`:
```bash ```python
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased bert-base-cased.onnx >>> from typing import Mapping, OrderedDict
>>> from transformers.onnx import OnnxConfig
>>> class DistilBertOnnxConfig(OnnxConfig):
... @property
... def inputs(self) -> Mapping[str, Mapping[int, str]]:
... return OrderedDict(
... [
... ("input_ids", {0: "batch", 1: "sequence"}),
... ("attention_mask", {0: "batch", 1: "sequence"}),
... ]
... )
``` ```
The conversion tool works for both PyTorch and Tensorflow models and ensures: Every configuration object must implement the `inputs` property and return a
mapping, where each key corresponds to an expected input, and each value
- The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint. indicates the axis of that input. For DistilBERT, we can see that two inputs are
- The inputs and outputs are correctly generated to their ONNX counterpart. required: `input_ids` and `attention_mask`. These inputs have the same shape of
- The generated model can be correctly loaded through onnxruntime. `(batch_size, sequence_length)` which is why we see the same axes used in the
configuration.
<Tip> <Tip>
Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations on the Notice that `inputs` property for `DistilBertOnnxConfig` returns an
ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please open up an issue on `OrderedDict`. This ensures that the inputs are matched with their relative
transformers. position within the `PreTrainedModel.forward()` method when tracing the graph.
We recommend using an `OrderedDict` for the `inputs` and `outputs` properties
when implementing custom ONNX configurations.
</Tip> </Tip>
Also, the conversion tool supports different options which let you tune the behavior of the generated model: Once you have implemented an ONNX configuration, you can instantiate it by
providing the base model's configuration as follows:
- **Change the target opset version of the generated model.** (More recent opset generally supports more operators and ```python
enables faster inference) >>> from transformers import AutoConfig
- **Export pipeline-specific prediction heads.** (Allow to export model along with its task-specific prediction >>> config = AutoConfig.from_pretrained("distilbert-base-uncased")
head(s)) >>> onnx_config = DistilBertOnnxConfig(config)
```
- **Use the external data format (PyTorch only).** (Lets you export model which size is above 2Gb ([More info](https://github.com/pytorch/pytorch/pull/33062))) The resulting object has several useful properties. For example you can view the
ONNX operator set that will be used during the export:
```python
>>> print(onnx_config.default_onnx_opset)
11
```
### Optimizations You can also view the outputs associated with the model as follows:
ONNXRuntime includes some transformers-specific transformations to leverage optimized operations in the graph. Below ```python
are some of the operators which can be enabled to speed up inference through ONNXRuntime (*see note below*): >>> print(onnx_config.outputs)
OrderedDict([("last_hidden_state", {0: "batch", 1: "sequence"})])
```
- Constant folding Notice that the outputs property follows the same structure as the inputs; it
- Attention Layer fusing returns an `OrderedDict` of named outputs and their shapes. The output structure
- Skip connection LayerNormalization fusing is linked to the choice of feature that the configuration is initialised with.
- FastGeLU approximation By default, the ONNX configuration is initialized with the `default` feature
that corresponds to exporting a model loaded with the `AutoModel` class. If you
want to export a different model topology, just provide a different feature to
the `task` argument when you initialize the ONNX configuration. For example, if
we wished to export DistilBERT with a sequence classification head, we could
use:
Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances if ```python
used on another machine with a different hardware configuration than the one used for exporting the model. For this >>> from transformers import AutoConfig
reason, when using `convert_graph_to_onnx.py` optimizations are not enabled, ensuring the model can be easily
exported to various hardware. Optimizations can then be enabled when loading the model through ONNX runtime for
inference.
>>> config = AutoConfig.from_pretrained("distilbert-base-uncased")
>>> onnx_config_for_seq_clf = DistilBertOnnxConfig(config, task="sequence-classification")
>>> print(onnx_config_for_seq_clf.outputs)
OrderedDict([('logits', {0: 'batch'})])
```
<Tip> <Tip>
When quantization is enabled (see below), `convert_graph_to_onnx.py` script will enable optimizations on the All of the base properties and methods associated with
model because quantization would modify the underlying graph making it impossible for ONNX runtime to do the [`OnnxConfig`]
optimizations afterwards. and the other configuration classes can be overriden if needed. Check out
[`BartOnnxConfig`]
for an advanced example.
</Tip> </Tip>
<Tip> #### Exporting the model
For more information about the optimizations enabled by ONNXRuntime, please have a look at the [ONNXRuntime Github](https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers).
</Tip> Once you have implemented the ONNX configuration, the next step is to export the
model. Here we can use the `export()` function provided by the
`transformers.onnx` package. This function expects the ONNX configuration, along
with the base model and tokenizer, and the path to save the exported file:
### Quantization ```python
>>> from pathlib import Path
>>> from transformers.onnx import export
>>> from transformers import AutoTokenizer, AutoModel
ONNX exporter supports generating a quantized version of the model to allow efficient inference. >>> onnx_path = Path("model.onnx")
>>> model_ckpt = "distilbert-base-uncased"
>>> base_model = AutoModel.from_pretrained(model_ckpt)
>>> tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
Quantization works by converting the memory representation of the parameters in the neural network to a compact integer >>> onnx_inputs, onnx_outputs = export(tokenizer, base_model, onnx_config, onnx_config.default_onnx_opset, onnx_path)
format. By default, weights of a neural network are stored as single-precision float (*float32*) which can express a ```
wide-range of floating-point numbers with decent precision. These properties are especially interesting at training
where you want fine-grained representation.
On the other hand, after the training phase, it has been shown one can greatly reduce the range and the precision of The `onnx_inputs` and `onnx_outputs` returned by the `export()` function are
*float32* numbers without changing the performances of the neural network. lists of the keys defined in the `inputs` and `outputs` properties of the
configuration. Once the model is exported, you can test that the model is well
formed as follows:
More technically, *float32* parameters are converted to a type requiring fewer bits to represent each number, thus ```python
reducing the overall size of the model. Here, we are enabling *float32* mapping to *int8* values (a non-floating, >>> import onnx
single byte, number representation) according to the following formula:
$$y_{float32} = scale * x_{int8} - zero\_point$$ >>> onnx_model = onnx.load("model.onnx")
>>> onnx.checker.check_model(onnx_model)
```
<Tip> <Tip>
The quantization process will infer the parameter *scale* and *zero_point* from the neural network parameters If your model is larger than 2GB, you will see that many additional files are
created during the export. This is _expected_ because ONNX uses [Protocol
Buffers](https://developers.google.com/protocol-buffers/) to store the model and
these have a size limit of 2GB. See the [ONNX
documentation](https://github.com/onnx/onnx/blob/master/docs/ExternalData.md)
for instructions on how to load models with external data.
</Tip> </Tip>
Leveraging tiny-integers has numerous advantages when it comes to inference: #### Validating the model outputs
- Storing fewer bits instead of 32 bits for the *float32* reduces the size of the model and makes it load faster.
- Integer operations execute a magnitude faster on modern hardware
- Integer operations require less power to do the computations
In order to convert a transformers model to ONNX IR with quantized weights you just need to specify `--quantize` when The final step is to validate that the outputs from the base and exported model
using `convert_graph_to_onnx.py`. Also, you can have a look at the `quantize()` utility-method in this same script agree within some absolute tolerance. Here we can use the
file. `validate_model_outputs()` function provided by the `transformers.onnx` package
as follows:
Example of quantized BERT model export: ```python
>>> from transformers.onnx import validate_model_outputs
```bash >>> validate_model_outputs(
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased --quantize bert-base-cased.onnx ... onnx_config, tokenizer, base_model, onnx_path, onnx_outputs, onnx_config.atol_for_validation
... )
``` ```
<Tip> This function uses the `OnnxConfig.generate_dummy_inputs()` method to generate
inputs for the base and exported model, and the absolute tolerance can be
Quantization support requires ONNX Runtime >= 1.4.0 defined in the configuration. We generally find numerical agreement in the 1e-6
to 1e-4 range, although anything smaller than 1e-3 is likely to be OK.
</Tip> ### Contributing a new configuration to 🤗 Transformers
<Tip> We are looking to expand the set of ready-made configurations and welcome
contributions from the community! If you would like to contribute your addition
to the library, you will need to:
When exporting quantized model you will end up with two different ONNX files. The one specified at the end of the * Implement the ONNX configuration in the corresponding `configuration_<model_name>.py`
above command will contain the original ONNX model storing *float32* weights. The second one, with `-quantized` file
suffix, will hold the quantized parameters. * Include the model architecture and corresponding features in
[`onnx.features.FeatureManager`](https://github.com/huggingface/transformers/blob/c4fa908fa98c3d538462c537d29b7613dd71306e/src/transformers/onnx/features.py#L71)
* Add your model architecture to the tests in
`test_onnx_v2.py`
</Tip> Check out how the configuration for [IBERT was
contributed](https://github.com/huggingface/transformers/pull/14868/files) to
get an idea of what's involved.
## TorchScript ## TorchScript
......
...@@ -23,17 +23,17 @@ from .features import FeaturesManager ...@@ -23,17 +23,17 @@ from .features import FeaturesManager
def main(): def main():
parser = ArgumentParser("Hugging Face ONNX Exporter tool") parser = ArgumentParser("Hugging Face Transformers ONNX exporter")
parser.add_argument("-m", "--model", type=str, required=True, help="Model's name of path on disk to load.") parser.add_argument(
"-m", "--model", type=str, required=True, help="Model ID on huggingface.co or path on disk to load model from."
)
parser.add_argument( parser.add_argument(
"--feature", "--feature",
choices=list(FeaturesManager.AVAILABLE_FEATURES), choices=list(FeaturesManager.AVAILABLE_FEATURES),
default="default", default="default",
help="Export the model with some additional feature.", help="The type of features to export the model with.",
)
parser.add_argument(
"--opset", type=int, default=None, help="ONNX opset version to export the model with (default 12)."
) )
parser.add_argument("--opset", type=int, default=None, help="ONNX opset version to export the model with.")
parser.add_argument( parser.add_argument(
"--atol", type=float, default=None, help="Absolute difference tolerence when validating the model." "--atol", type=float, default=None, help="Absolute difference tolerence when validating the model."
) )
......
...@@ -207,11 +207,11 @@ def get_onnx_model_list(): ...@@ -207,11 +207,11 @@ def get_onnx_model_list():
def check_onnx_model_list(overwrite=False): def check_onnx_model_list(overwrite=False):
"""Check the model list in the serialization.rst is consistent with the state of the lib and maybe `overwrite`.""" """Check the model list in the serialization.mdx is consistent with the state of the lib and maybe `overwrite`."""
current_list, start_index, end_index, lines = _find_text_in_file( current_list, start_index, end_index, lines = _find_text_in_file(
filename=os.path.join(PATH_TO_DOCS, "serialization.mdx"), filename=os.path.join(PATH_TO_DOCS, "serialization.mdx"),
start_prompt="<!--This table is automatically generated by make style, do not fill manually!-->", start_prompt="<!--This table is automatically generated by make style, do not fill manually!-->",
end_prompt="This conversion is handled with the PyTorch version of models ", end_prompt="The ONNX conversion is supported for the PyTorch versions of the models.",
) )
new_list = get_onnx_model_list() new_list = get_onnx_model_list()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment