benchmarks.rst 18.4 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Benchmarks
Sylvain Gugger's avatar
Sylvain Gugger committed
14
=======================================================================================================================
15
16
17

Let's take a look at how 馃 Transformer models can be benchmarked, best practices, and already available benchmarks.

Sylvain Gugger's avatar
Sylvain Gugger committed
18
19
A notebook explaining in more detail how to benchmark 馃 Transformer models can be found `here
<https://github.com/huggingface/transformers/blob/master/notebooks/05-benchmark.ipynb>`__.
20
21

How to benchmark 馃 Transformer models
Sylvain Gugger's avatar
Sylvain Gugger committed
22
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
23

Sylvain Gugger's avatar
Sylvain Gugger committed
24
25
26
The classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` allow to flexibly
benchmark 馃 Transformer models. The benchmark classes allow us to measure the `peak memory usage` and `required time`
for both `inference` and `training`.
27
28
29

.. note::

Sylvain Gugger's avatar
Sylvain Gugger committed
30
31
  Hereby, `inference` is defined by a single forward pass, and `training` is defined by a single forward pass and
  backward pass.
32

Sylvain Gugger's avatar
Sylvain Gugger committed
33
34
35
36
37
38
The benchmark classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` expect an
object of type :class:`~transformers.PyTorchBenchmarkArguments` and
:class:`~transformers.TensorFlowBenchmarkArguments`, respectively, for instantiation.
:class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments` are data
classes and contain all relevant configurations for their corresponding benchmark class. In the following example, it
is shown how a BERT model of type `bert-base-cased` can be benchmarked.
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

.. code-block::

    >>> ## PYTORCH CODE
    >>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments

    >>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
    >>> benchmark = PyTorchBenchmark(args)

    >>> ## TENSORFLOW CODE
    >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments

    >>> args = TensorFlowBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
    >>> benchmark = TensorFlowBenchmark(args)


Sylvain Gugger's avatar
Sylvain Gugger committed
55
56
57
58
59
60
61
62
63
Here, three arguments are given to the benchmark argument data classes, namely ``models``, ``batch_sizes``, and
``sequence_lengths``. The argument ``models`` is required and expects a :obj:`list` of model identifiers from the
`model hub <https://huggingface.co/models>`__ The :obj:`list` arguments ``batch_sizes`` and ``sequence_lengths`` define
the size of the ``input_ids`` on which the model is benchmarked. There are many more parameters that can be configured
via the benchmark argument data classes. For more detail on these one can either directly consult the files
``src/transformers/benchmark/benchmark_args_utils.py``, ``src/transformers/benchmark/benchmark_args.py`` (for PyTorch)
and ``src/transformers/benchmark/benchmark_args_tf.py`` (for Tensorflow). Alternatively, running the following shell
commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow
respectively.
64

Sylvain Gugger's avatar
Sylvain Gugger committed
65
.. code-block:: bash
66

Sylvain Gugger's avatar
Sylvain Gugger committed
67
    ## PYTORCH CODE
68
69
    python examples/benchmarking/run_benchmark.py --help

Sylvain Gugger's avatar
Sylvain Gugger committed
70
    ## TENSORFLOW CODE
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
    python examples/benchmarking/run_benchmark_tf.py --help


An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.

.. code-block::

    >>> ## PYTORCH CODE
    >>> results = benchmark.run()
    >>> print(results)
    ====================       INFERENCE - SPEED - RESULT       ====================
    --------------------------------------------------------------------------------
    Model Name             Batch Size     Seq Length     Time in s                  
    --------------------------------------------------------------------------------
    bert-base-uncased          8               8             0.006     
    bert-base-uncased          8               32            0.006     
    bert-base-uncased          8              128            0.018     
    bert-base-uncased          8              512            0.088     
    --------------------------------------------------------------------------------
Sylvain Gugger's avatar
Sylvain Gugger committed
90

91
92
93
94
95
96
97
98
99
    ====================      INFERENCE - MEMORY - RESULT       ====================
    --------------------------------------------------------------------------------
    Model Name             Batch Size     Seq Length    Memory in MB 
    --------------------------------------------------------------------------------
    bert-base-uncased          8               8             1227
    bert-base-uncased          8               32            1281
    bert-base-uncased          8              128            1307
    bert-base-uncased          8              512            1539
    --------------------------------------------------------------------------------
Sylvain Gugger's avatar
Sylvain Gugger committed
100

101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
    ====================        ENVIRONMENT INFORMATION         ====================
    - transformers_version: 2.11.0
    - framework: PyTorch
    - use_torchscript: False
    - framework_version: 1.4.0
    - python_version: 3.6.10
    - system: Linux
    - cpu: x86_64
    - architecture: 64bit
    - date: 2020-06-29
    - time: 08:58:43.371351
    - fp16: False
    - use_multiprocessing: True
    - only_pretrain_model: False
    - cpu_ram_mb: 32088
    - use_gpu: True
    - num_gpus: 1
    - gpu: TITAN RTX
    - gpu_ram_mb: 24217
    - gpu_power_watts: 280.0
    - gpu_performance_state: 2
    - use_tpu: False
Sylvain Gugger's avatar
Sylvain Gugger committed
123

124
125
126
127
128
129
130
131
132
133
134
135
    >>> ## TENSORFLOW CODE
    >>> results = benchmark.run()
    >>> print(results)
    ====================       INFERENCE - SPEED - RESULT       ====================
    --------------------------------------------------------------------------------
    Model Name             Batch Size     Seq Length     Time in s                  
    --------------------------------------------------------------------------------
    bert-base-uncased          8               8             0.005
    bert-base-uncased          8               32            0.008
    bert-base-uncased          8              128            0.022
    bert-base-uncased          8              512            0.105
    --------------------------------------------------------------------------------
Sylvain Gugger's avatar
Sylvain Gugger committed
136

137
138
139
140
141
142
143
144
145
    ====================      INFERENCE - MEMORY - RESULT       ====================
    --------------------------------------------------------------------------------
    Model Name             Batch Size     Seq Length    Memory in MB 
    --------------------------------------------------------------------------------
    bert-base-uncased          8               8             1330
    bert-base-uncased          8               32            1330
    bert-base-uncased          8              128            1330
    bert-base-uncased          8              512            1770
    --------------------------------------------------------------------------------
Sylvain Gugger's avatar
Sylvain Gugger committed
146

147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
    ====================        ENVIRONMENT INFORMATION         ====================
    - transformers_version: 2.11.0
    - framework: Tensorflow
    - use_xla: False
    - framework_version: 2.2.0
    - python_version: 3.6.10
    - system: Linux
    - cpu: x86_64
    - architecture: 64bit
    - date: 2020-06-29
    - time: 09:26:35.617317
    - fp16: False
    - use_multiprocessing: True
    - only_pretrain_model: False
    - cpu_ram_mb: 32088
    - use_gpu: True
    - num_gpus: 1
    - gpu: TITAN RTX
    - gpu_ram_mb: 24217
    - gpu_power_watts: 280.0
    - gpu_performance_state: 2
    - use_tpu: False

Sylvain Gugger's avatar
Sylvain Gugger committed
170
171
172
173
174
175
176
By default, the `time` and the `required memory` for `inference` are benchmarked. In the example output above the first
two sections show the result corresponding to `inference time` and `inference memory`. In addition, all relevant
information about the computing environment, `e.g.` the GPU type, the system, the library versions, etc... are printed
out in the third section under `ENVIRONMENT INFORMATION`. This information can optionally be saved in a `.csv` file
when adding the argument :obj:`save_to_csv=True` to :class:`~transformers.PyTorchBenchmarkArguments` and
:class:`~transformers.TensorFlowBenchmarkArguments` respectively. In this case, every section is saved in a separate
`.csv` file. The path to each `.csv` file can optionally be defined via the argument data classes.
177

Sylvain Gugger's avatar
Sylvain Gugger committed
178
179
180
Instead of benchmarking pre-trained models via their model identifier, `e.g.` `bert-base-uncased`, the user can
alternatively benchmark an arbitrary configuration of any available model class. In this case, a :obj:`list` of
configurations must be inserted with the benchmark args as follows.
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210

.. code-block::

    >>> ## PYTORCH CODE
    >>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig

    >>> args = PyTorchBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
    >>> config_base = BertConfig()
    >>> config_384_hid = BertConfig(hidden_size=384)
    >>> config_6_lay = BertConfig(num_hidden_layers=6)

    >>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
    >>> benchmark.run()
    ====================       INFERENCE - SPEED - RESULT       ====================
    --------------------------------------------------------------------------------
    Model Name             Batch Size     Seq Length       Time in s                  
    --------------------------------------------------------------------------------
    bert-base                  8              128            0.006
    bert-base                  8              512            0.006
    bert-base                  8              128            0.018     
    bert-base                  8              512            0.088     
    bert-384-hid              8               8             0.006     
    bert-384-hid              8               32            0.006     
    bert-384-hid              8              128            0.011     
    bert-384-hid              8              512            0.054     
    bert-6-lay                 8               8             0.003     
    bert-6-lay                 8               32            0.004     
    bert-6-lay                 8              128            0.009     
    bert-6-lay                 8              512            0.044
    --------------------------------------------------------------------------------
Sylvain Gugger's avatar
Sylvain Gugger committed
211

212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
    ====================      INFERENCE - MEMORY - RESULT       ====================
    --------------------------------------------------------------------------------
    Model Name             Batch Size     Seq Length      Memory in MB 
    --------------------------------------------------------------------------------
    bert-base                  8               8             1277
    bert-base                  8               32            1281
    bert-base                  8              128            1307     
    bert-base                  8              512            1539     
    bert-384-hid              8               8             1005     
    bert-384-hid              8               32            1027     
    bert-384-hid              8              128            1035     
    bert-384-hid              8              512            1255     
    bert-6-lay                 8               8             1097     
    bert-6-lay                 8               32            1101     
    bert-6-lay                 8              128            1127     
    bert-6-lay                 8              512            1359
    --------------------------------------------------------------------------------
Sylvain Gugger's avatar
Sylvain Gugger committed
229

230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
    ====================        ENVIRONMENT INFORMATION         ====================
    - transformers_version: 2.11.0
    - framework: PyTorch
    - use_torchscript: False
    - framework_version: 1.4.0
    - python_version: 3.6.10
    - system: Linux
    - cpu: x86_64
    - architecture: 64bit
    - date: 2020-06-29
    - time: 09:35:25.143267
    - fp16: False
    - use_multiprocessing: True
    - only_pretrain_model: False
    - cpu_ram_mb: 32088
    - use_gpu: True
    - num_gpus: 1
    - gpu: TITAN RTX
    - gpu_ram_mb: 24217
    - gpu_power_watts: 280.0
    - gpu_performance_state: 2
    - use_tpu: False

    >>> ## TENSORFLOW CODE
    >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig

    >>> args = TensorFlowBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
    >>> config_base = BertConfig()
    >>> config_384_hid = BertConfig(hidden_size=384)
    >>> config_6_lay = BertConfig(num_hidden_layers=6)

    >>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
    >>> benchmark.run()
    ====================       INFERENCE - SPEED - RESULT       ====================
    --------------------------------------------------------------------------------
    Model Name             Batch Size     Seq Length       Time in s                  
    --------------------------------------------------------------------------------
    bert-base                  8               8             0.005
    bert-base                  8               32            0.008
    bert-base                  8              128            0.022
    bert-base                  8              512            0.106
    bert-384-hid              8               8             0.005
    bert-384-hid              8               32            0.007
    bert-384-hid              8              128            0.018
    bert-384-hid              8              512            0.064
    bert-6-lay                 8               8             0.002
    bert-6-lay                 8               32            0.003
    bert-6-lay                 8              128            0.0011
    bert-6-lay                 8              512            0.074
    --------------------------------------------------------------------------------
Sylvain Gugger's avatar
Sylvain Gugger committed
280

281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
    ====================      INFERENCE - MEMORY - RESULT       ====================
    --------------------------------------------------------------------------------
    Model Name             Batch Size     Seq Length      Memory in MB 
    --------------------------------------------------------------------------------
    bert-base                  8               8             1330
    bert-base                  8               32            1330
    bert-base                  8              128            1330
    bert-base                  8              512            1770
    bert-384-hid              8               8             1330
    bert-384-hid              8               32            1330
    bert-384-hid              8              128            1330
    bert-384-hid              8              512            1540
    bert-6-lay                 8               8             1330
    bert-6-lay                 8               32            1330
    bert-6-lay                 8              128            1330
    bert-6-lay                 8              512            1540
    --------------------------------------------------------------------------------
Sylvain Gugger's avatar
Sylvain Gugger committed
298

299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
    ====================        ENVIRONMENT INFORMATION         ====================
    - transformers_version: 2.11.0
    - framework: Tensorflow
    - use_xla: False
    - framework_version: 2.2.0
    - python_version: 3.6.10
    - system: Linux
    - cpu: x86_64
    - architecture: 64bit
    - date: 2020-06-29
    - time: 09:38:15.487125
    - fp16: False
    - use_multiprocessing: True
    - only_pretrain_model: False
    - cpu_ram_mb: 32088
    - use_gpu: True
    - num_gpus: 1
    - gpu: TITAN RTX
    - gpu_ram_mb: 24217
    - gpu_power_watts: 280.0
    - gpu_performance_state: 2
    - use_tpu: False


Sylvain Gugger's avatar
Sylvain Gugger committed
323
324
325
Again, `inference time` and `required memory` for `inference` are measured, but this time for customized configurations
of the :obj:`BertModel` class. This feature can especially be helpful when deciding for which configuration the model
should be trained.
326
327
328


Benchmark best practices
Sylvain Gugger's avatar
Sylvain Gugger committed
329
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
330
331
332

This section lists a couple of best practices one should be aware of when benchmarking a model.

Sylvain Gugger's avatar
Sylvain Gugger committed
333
334
335
336
337
338
339
340
341
- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
  specifies on which device the code should be run by setting the ``CUDA_VISIBLE_DEVICES`` environment variable in the
  shell, `e.g.` ``export CUDA_VISIBLE_DEVICES=0`` before running the code.
- The option :obj:`no_multi_processing` should only be set to :obj:`True` for testing and debugging. To ensure accurate
  memory measurement it is recommended to run each memory benchmark in a separate process by making sure
  :obj:`no_multi_processing` is set to :obj:`True`.
- One should always state the environment information when sharing the results of a model benchmark. Results can vary
  heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very
  useful for the community.
342
343
344


Sharing your benchmark
Sylvain Gugger's avatar
Sylvain Gugger committed
345
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
346

Sylvain Gugger's avatar
Sylvain Gugger committed
347
348
349
Previously all available core models (10 at the time) have been benchmarked for `inference time`, across many different
settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were
done across CPUs (except for TensorFlow XLA) and GPUs.
350

Sylvain Gugger's avatar
Sylvain Gugger committed
351
352
353
354
The approach is detailed in the `following blogpost
<https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2>`__ and the results are
available `here
<https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing>`__.
355

Sylvain Gugger's avatar
Sylvain Gugger committed
356
357
With the new `benchmark` tools, it is easier than ever to share your benchmark results with the community `here
<https://github.com/huggingface/transformers/blob/master/examples/benchmarking/README.md>`__.