"tests/models/bertweet/test_tokenization_bertweet.py" did not exist on "158e82e061c02fc2f1613adb7ac1d1cb6adae71c"
serialization.md 9.75 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
<!--Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
11
12
13
14

鈿狅笍 Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

Sylvain Gugger's avatar
Sylvain Gugger committed
15
16
-->

Steven Liu's avatar
Steven Liu committed
17
# Export to ONNX
Sylvain Gugger's avatar
Sylvain Gugger committed
18

19
20
Deploying 馃 Transformers models in production environments often requires, or can benefit from exporting the models into 
a serialized format that can be loaded and executed on specialized runtimes and hardware.
Sylvain Gugger's avatar
Sylvain Gugger committed
21

22
23
24
25
26
27
28
29
30
31
32
馃 Optimum is an extension of Transformers that enables exporting models from PyTorch or TensorFlow to serialized formats 
such as ONNX and TFLite through its `exporters` module. 馃 Optimum also provides a set of performance optimization tools to train 
and run models on targeted hardware with maximum efficiency.

This guide demonstrates how you can export 馃 Transformers models to ONNX with 馃 Optimum, for the guide on exporting models to TFLite, 
please refer to the [Export to TFLite page](tflite).

## Export to ONNX 

[ONNX (Open Neural Network eXchange)](http://onnx.ai) is an open standard that defines a common set of operators and a 
common file format to represent deep learning models in a wide variety of frameworks, including PyTorch and
Steven Liu's avatar
Steven Liu committed
33
34
35
TensorFlow. When a model is exported to the ONNX format, these operators are used to
construct a computational graph (often called an _intermediate representation_) which
represents the flow of data through the neural network.
Sylvain Gugger's avatar
Sylvain Gugger committed
36

Steven Liu's avatar
Steven Liu committed
37
38
39
By exposing a graph with standardized operators and data types, ONNX makes it easy to
switch between frameworks. For example, a model trained in PyTorch can be exported to
ONNX format and then imported in TensorFlow (and vice versa).
Sylvain Gugger's avatar
Sylvain Gugger committed
40

41
42
43
Once exported to ONNX format, a model can be:
- optimized for inference via techniques such as [graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization) and [quantization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization). 
- run with ONNX Runtime via [`ORTModelForXXX` classes](https://huggingface.co/docs/optimum/onnxruntime/package_reference/modeling_ort),
44
which follow the same `AutoModel` API as the one you are used to in 馃 Transformers.
45
46
- run with [optimized inference pipelines](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines),
which has the same API as the [`pipeline`] function in 馃 Transformers. 
47

48
49
馃 Optimum provides support for the ONNX export by leveraging configuration objects. These configuration objects come 
ready-made for a number of model architectures, and are designed to be easily extendable to other architectures.
50

51
For the list of ready-made configurations, please refer to [馃 Optimum documentation](https://huggingface.co/docs/optimum/exporters/onnx/overview).
52

53
There are two ways to export a 馃 Transformers model to ONNX, here we show both:
54

55
56
- export with 馃 Optimum via CLI.
- export with 馃 Optimum with `optimum.onnxruntime`.
57

58
59
60
### Exporting a 馃 Transformers model to ONNX with CLI

To export a 馃 Transformers model to ONNX, first install an extra dependency:
Sylvain Gugger's avatar
Sylvain Gugger committed
61

lewtun's avatar
lewtun committed
62
```bash
63
pip install optimum[exporters]
lewtun's avatar
lewtun committed
64
65
```

66
67
To check out all available arguments, refer to the [馃 Optimum docs](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli), 
or view help in command line:
Sylvain Gugger's avatar
Sylvain Gugger committed
68
69

```bash
70
optimum-cli export onnx --help
Sylvain Gugger's avatar
Sylvain Gugger committed
71
72
```

73
To export a model's checkpoint from the 馃 Hub, for example, `distilbert-base-uncased-distilled-squad`, run the following command: 
Sylvain Gugger's avatar
Sylvain Gugger committed
74
75

```bash
76
optimum-cli export onnx --model distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
Sylvain Gugger's avatar
Sylvain Gugger committed
77
78
```

79
You should see the logs indicating progress and showing where the resulting `model.onnx` is saved, like this:
Sylvain Gugger's avatar
Sylvain Gugger committed
80
81

```bash
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx...
	-[鉁揮 ONNX model output names match reference model (start_logits, end_logits)
	- Validating ONNX Model output "start_logits":
		-[鉁揮 (2, 16) matches (2, 16)
		-[鉁揮 all values close (atol: 0.0001)
	- Validating ONNX Model output "end_logits":
		-[鉁揮 (2, 16) matches (2, 16)
		-[鉁揮 all values close (atol: 0.0001)
The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx
```

The example above illustrates exporting a checkpoint from 馃 Hub. When exporting a local model, first make sure that you 
saved both the model's weights and tokenizer files in the same directory (`local_path`). When using CLI, pass the 
`local_path` to the `model` argument instead of the checkpoint name on 馃 Hub and provide the `--task` argument. 
You can review the list of supported tasks in the [馃 Optimum documentation](https://huggingface.co/docs/optimum/exporters/task_manager).
If `task` argument is not provided, it will default to the model architecture without any task specific head.
Sylvain Gugger's avatar
Sylvain Gugger committed
98

99
100
101
```bash
optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/
```
Sylvain Gugger's avatar
Sylvain Gugger committed
102

lewtun's avatar
lewtun committed
103
The resulting `model.onnx` file can then be run on one of the [many
Steven Liu's avatar
Steven Liu committed
104
105
accelerators](https://onnx.ai/supported-tools.html#deployModel) that support the ONNX
standard. For example, we can load and run the model with [ONNX
lewtun's avatar
lewtun committed
106
Runtime](https://onnxruntime.ai/) as follows:
Sylvain Gugger's avatar
Sylvain Gugger committed
107

lewtun's avatar
lewtun committed
108
109
```python
>>> from transformers import AutoTokenizer
110
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
111

112
113
114
115
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> model = ORTModelForQuestionAnswering.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt")
>>> outputs = model(**inputs)
116
117
```

118
119
The process is identical for TensorFlow checkpoints on the Hub. For instance, here's how you would
export a pure TensorFlow checkpoint from the [Keras organization](https://huggingface.co/keras-io):
120
121

```bash
122
optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_squad_onnx/
123
```
124

125
### Exporting a 馃 Transformers model to ONNX with `optimum.onnxruntime`
lewtun's avatar
lewtun committed
126

127
Alternative to CLI, you can export a 馃 Transformers model to ONNX programmatically like so: 
Sylvain Gugger's avatar
Sylvain Gugger committed
128
129

```python
130
131
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> from transformers import AutoTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
132

133
134
>>> model_checkpoint = "distilbert_base_uncased_squad"
>>> save_directory = "onnx/"
Sylvain Gugger's avatar
Sylvain Gugger committed
135

136
137
138
>>> # Load a model from transformers and export it to ONNX
>>> ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)
>>> tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
lewtun's avatar
lewtun committed
139

140
141
142
>>> # Save the onnx model and tokenizer
>>> ort_model.save_pretrained(save_directory)
>>> tokenizer.save_pretrained(save_directory)
Sylvain Gugger's avatar
Sylvain Gugger committed
143
144
```

145
### Exporting a model for an unsupported architecture
146
147

If you wish to contribute by adding support for a model that cannot be currently exported, you should first check if it is
148
149
supported in [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview),
and if it is not, [contribute to 馃 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)
150
151
directly.

152
### Exporting a model with `transformers.onnx`
Sylvain Gugger's avatar
Sylvain Gugger committed
153

154
<Tip warning={true}>
Sylvain Gugger's avatar
Sylvain Gugger committed
155

156
`tranformers.onnx` is no longer maintained, please export models with 馃 Optimum as described above. This section will be removed in the future versions.
Sylvain Gugger's avatar
Sylvain Gugger committed
157
158
159

</Tip>

160
To export a 馃 Transformers model to ONNX with `tranformers.onnx`, install extra dependencies:
Sylvain Gugger's avatar
Sylvain Gugger committed
161

162
163
```bash
pip install transformers[onnx]
lewtun's avatar
lewtun committed
164
```
Sylvain Gugger's avatar
Sylvain Gugger committed
165

166
Use `transformers.onnx` package as a Python module to export a checkpoint using a ready-made configuration:
Sylvain Gugger's avatar
Sylvain Gugger committed
167

168
169
```bash
python -m transformers.onnx --model=distilbert-base-uncased onnx/
lewtun's avatar
lewtun committed
170
```
Sylvain Gugger's avatar
Sylvain Gugger committed
171

172
173
174
This exports an ONNX graph of the checkpoint defined by the `--model` argument. Pass any checkpoint on the 馃 Hub or one that's stored locally.
The resulting `model.onnx` file can then be run on one of the many accelerators that support the ONNX standard. For example, 
load and run the model with ONNX Runtime as follows:
Sylvain Gugger's avatar
Sylvain Gugger committed
175

lewtun's avatar
lewtun committed
176
```python
177
178
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
Sylvain Gugger's avatar
Sylvain Gugger committed
179

180
181
182
183
184
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
lewtun's avatar
lewtun committed
185
```
Sylvain Gugger's avatar
Sylvain Gugger committed
186

187
188
The required output names (like `["last_hidden_state"]`) can be obtained by taking a look at the ONNX configuration of 
each model. For example, for DistilBERT we have:
Sylvain Gugger's avatar
Sylvain Gugger committed
189

lewtun's avatar
lewtun committed
190
```python
191
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
192

193
194
195
196
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
lewtun's avatar
lewtun committed
197
```
Sylvain Gugger's avatar
Sylvain Gugger committed
198

199
The process is identical for TensorFlow checkpoints on the Hub. For example, export a pure TensorFlow checkpoint like so:
Sylvain Gugger's avatar
Sylvain Gugger committed
200

201
202
```bash
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
Sylvain Gugger's avatar
Sylvain Gugger committed
203
204
```

205
206
To export a model that's stored locally, save the model's weights and tokenizer files in the same directory (e.g. `local-pt-checkpoint`), 
then export it to ONNX by pointing the `--model` argument of the `transformers.onnx` package to the desired directory:
Sylvain Gugger's avatar
Sylvain Gugger committed
207

208
209
210
```bash
python -m transformers.onnx --model=local-pt-checkpoint onnx/
```