serialization.mdx 9.58 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
<!--Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

Steven Liu's avatar
Steven Liu committed
13
# Export to ONNX
Sylvain Gugger's avatar
Sylvain Gugger committed
14

15
16
Deploying 馃 Transformers models in production environments often requires, or can benefit from exporting the models into 
a serialized format that can be loaded and executed on specialized runtimes and hardware.
Sylvain Gugger's avatar
Sylvain Gugger committed
17

18
19
20
21
22
23
24
25
26
27
28
馃 Optimum is an extension of Transformers that enables exporting models from PyTorch or TensorFlow to serialized formats 
such as ONNX and TFLite through its `exporters` module. 馃 Optimum also provides a set of performance optimization tools to train 
and run models on targeted hardware with maximum efficiency.

This guide demonstrates how you can export 馃 Transformers models to ONNX with 馃 Optimum, for the guide on exporting models to TFLite, 
please refer to the [Export to TFLite page](tflite).

## Export to ONNX 

[ONNX (Open Neural Network eXchange)](http://onnx.ai) is an open standard that defines a common set of operators and a 
common file format to represent deep learning models in a wide variety of frameworks, including PyTorch and
Steven Liu's avatar
Steven Liu committed
29
30
31
TensorFlow. When a model is exported to the ONNX format, these operators are used to
construct a computational graph (often called an _intermediate representation_) which
represents the flow of data through the neural network.
Sylvain Gugger's avatar
Sylvain Gugger committed
32

Steven Liu's avatar
Steven Liu committed
33
34
35
By exposing a graph with standardized operators and data types, ONNX makes it easy to
switch between frameworks. For example, a model trained in PyTorch can be exported to
ONNX format and then imported in TensorFlow (and vice versa).
Sylvain Gugger's avatar
Sylvain Gugger committed
36

37
38
39
Once exported to ONNX format, a model can be:
- optimized for inference via techniques such as [graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization) and [quantization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization). 
- run with ONNX Runtime via [`ORTModelForXXX` classes](https://huggingface.co/docs/optimum/onnxruntime/package_reference/modeling_ort),
40
which follow the same `AutoModel` API as the one you are used to in 馃 Transformers.
41
42
- run with [optimized inference pipelines](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines),
which has the same API as the [`pipeline`] function in 馃 Transformers. 
43

44
45
馃 Optimum provides support for the ONNX export by leveraging configuration objects. These configuration objects come 
ready-made for a number of model architectures, and are designed to be easily extendable to other architectures.
46

47
For the list of ready-made configurations, please refer to [馃 Optimum documentation](https://huggingface.co/docs/optimum/exporters/onnx/overview).
48

49
There are two ways to export a 馃 Transformers model to ONNX, here we show both:
50

51
52
- export with 馃 Optimum via CLI.
- export with 馃 Optimum with `optimum.onnxruntime`.
53

54
55
56
### Exporting a 馃 Transformers model to ONNX with CLI

To export a 馃 Transformers model to ONNX, first install an extra dependency:
Sylvain Gugger's avatar
Sylvain Gugger committed
57

lewtun's avatar
lewtun committed
58
```bash
59
pip install optimum[exporters]
lewtun's avatar
lewtun committed
60
61
```

62
63
To check out all available arguments, refer to the [馃 Optimum docs](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli), 
or view help in command line:
Sylvain Gugger's avatar
Sylvain Gugger committed
64
65

```bash
66
optimum-cli export onnx --help
Sylvain Gugger's avatar
Sylvain Gugger committed
67
68
```

69
To export a model's checkpoint from the 馃 Hub, for example, `distilbert-base-uncased-distilled-squad`, run the following command: 
Sylvain Gugger's avatar
Sylvain Gugger committed
70
71

```bash
72
optimum-cli export onnx --model distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
Sylvain Gugger's avatar
Sylvain Gugger committed
73
74
```

75
You should see the logs indicating progress and showing where the resulting `model.onnx` is saved, like this:
Sylvain Gugger's avatar
Sylvain Gugger committed
76
77

```bash
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx...
	-[鉁揮 ONNX model output names match reference model (start_logits, end_logits)
	- Validating ONNX Model output "start_logits":
		-[鉁揮 (2, 16) matches (2, 16)
		-[鉁揮 all values close (atol: 0.0001)
	- Validating ONNX Model output "end_logits":
		-[鉁揮 (2, 16) matches (2, 16)
		-[鉁揮 all values close (atol: 0.0001)
The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx
```

The example above illustrates exporting a checkpoint from 馃 Hub. When exporting a local model, first make sure that you 
saved both the model's weights and tokenizer files in the same directory (`local_path`). When using CLI, pass the 
`local_path` to the `model` argument instead of the checkpoint name on 馃 Hub and provide the `--task` argument. 
You can review the list of supported tasks in the [馃 Optimum documentation](https://huggingface.co/docs/optimum/exporters/task_manager).
If `task` argument is not provided, it will default to the model architecture without any task specific head.
Sylvain Gugger's avatar
Sylvain Gugger committed
94

95
96
97
```bash
optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/
```
Sylvain Gugger's avatar
Sylvain Gugger committed
98

lewtun's avatar
lewtun committed
99
The resulting `model.onnx` file can then be run on one of the [many
Steven Liu's avatar
Steven Liu committed
100
101
accelerators](https://onnx.ai/supported-tools.html#deployModel) that support the ONNX
standard. For example, we can load and run the model with [ONNX
lewtun's avatar
lewtun committed
102
Runtime](https://onnxruntime.ai/) as follows:
Sylvain Gugger's avatar
Sylvain Gugger committed
103

lewtun's avatar
lewtun committed
104
105
```python
>>> from transformers import AutoTokenizer
106
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
107

108
109
110
111
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> model = ORTModelForQuestionAnswering.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt")
>>> outputs = model(**inputs)
112
113
```

114
115
The process is identical for TensorFlow checkpoints on the Hub. For instance, here's how you would
export a pure TensorFlow checkpoint from the [Keras organization](https://huggingface.co/keras-io):
116
117

```bash
118
optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_squad_onnx/
119
```
120

121
### Exporting a 馃 Transformers model to ONNX with `optimum.onnxruntime`
lewtun's avatar
lewtun committed
122

123
Alternative to CLI, you can export a 馃 Transformers model to ONNX programmatically like so: 
Sylvain Gugger's avatar
Sylvain Gugger committed
124
125

```python
126
127
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> from transformers import AutoTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
128

129
130
>>> model_checkpoint = "distilbert_base_uncased_squad"
>>> save_directory = "onnx/"
Sylvain Gugger's avatar
Sylvain Gugger committed
131

132
133
134
>>> # Load a model from transformers and export it to ONNX
>>> ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)
>>> tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
lewtun's avatar
lewtun committed
135

136
137
138
>>> # Save the onnx model and tokenizer
>>> ort_model.save_pretrained(save_directory)
>>> tokenizer.save_pretrained(save_directory)
Sylvain Gugger's avatar
Sylvain Gugger committed
139
140
```

141
### Exporting a model for an unsupported architecture
142
143

If you wish to contribute by adding support for a model that cannot be currently exported, you should first check if it is
144
145
supported in [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview),
and if it is not, [contribute to 馃 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)
146
147
directly.

148
### Exporting a model with `transformers.onnx`
Sylvain Gugger's avatar
Sylvain Gugger committed
149

150
<Tip warning={true}>
Sylvain Gugger's avatar
Sylvain Gugger committed
151

152
`tranformers.onnx` is no longer maintained, please export models with 馃 Optimum as described above. This section will be removed in the future versions.
Sylvain Gugger's avatar
Sylvain Gugger committed
153
154
155

</Tip>

156
To export a 馃 Transformers model to ONNX with `tranformers.onnx`, install extra dependencies:
Sylvain Gugger's avatar
Sylvain Gugger committed
157

158
159
```bash
pip install transformers[onnx]
lewtun's avatar
lewtun committed
160
```
Sylvain Gugger's avatar
Sylvain Gugger committed
161

162
Use `transformers.onnx` package as a Python module to export a checkpoint using a ready-made configuration:
Sylvain Gugger's avatar
Sylvain Gugger committed
163

164
165
```bash
python -m transformers.onnx --model=distilbert-base-uncased onnx/
lewtun's avatar
lewtun committed
166
```
Sylvain Gugger's avatar
Sylvain Gugger committed
167

168
169
170
This exports an ONNX graph of the checkpoint defined by the `--model` argument. Pass any checkpoint on the 馃 Hub or one that's stored locally.
The resulting `model.onnx` file can then be run on one of the many accelerators that support the ONNX standard. For example, 
load and run the model with ONNX Runtime as follows:
Sylvain Gugger's avatar
Sylvain Gugger committed
171

lewtun's avatar
lewtun committed
172
```python
173
174
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
Sylvain Gugger's avatar
Sylvain Gugger committed
175

176
177
178
179
180
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
lewtun's avatar
lewtun committed
181
```
Sylvain Gugger's avatar
Sylvain Gugger committed
182

183
184
The required output names (like `["last_hidden_state"]`) can be obtained by taking a look at the ONNX configuration of 
each model. For example, for DistilBERT we have:
Sylvain Gugger's avatar
Sylvain Gugger committed
185

lewtun's avatar
lewtun committed
186
```python
187
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
188

189
190
191
192
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
lewtun's avatar
lewtun committed
193
```
Sylvain Gugger's avatar
Sylvain Gugger committed
194

195
The process is identical for TensorFlow checkpoints on the Hub. For example, export a pure TensorFlow checkpoint like so:
Sylvain Gugger's avatar
Sylvain Gugger committed
196

197
198
```bash
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
Sylvain Gugger's avatar
Sylvain Gugger committed
199
200
```

201
202
To export a model that's stored locally, save the model's weights and tokenizer files in the same directory (e.g. `local-pt-checkpoint`), 
then export it to ONNX by pointing the `--model` argument of the `transformers.onnx` package to the desired directory:
Sylvain Gugger's avatar
Sylvain Gugger committed
203

204
205
206
```bash
python -m transformers.onnx --model=local-pt-checkpoint onnx/
```