performance.mdx 4.06 KB
Newer Older
Stas Bekman's avatar
Stas Bekman committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!---
Copyright 2021 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

17
# Performance and Scalability
Stas Bekman's avatar
Stas Bekman committed
18

19
Training larger and larger transformer models and deploying them to production comes with a range of challenges. During training your model can require more GPU memory than is available or be very slow to train and when you deploy it for inference it can be overwhelmed with the throughput that is required in the production environment. This documentation is designed to help you navigate these challenges and find the best setting for your use-case. We split the guides into training and inference as they come with different challenges and solutions. Then within each of them we have separate guides for different kinds of hardware setting (e.g. single vs. multi-GPU for training or CPU vs. GPU for infrence).
Stas Bekman's avatar
Stas Bekman committed
20

21
![perf_overview](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/perf_overview.png)
22

23
This document serves as an overview and entry point for the methods that could be useful for your scenario.
24

25
## Training
26

27
28
29
30
31
32
33
Training transformer models efficiently requires an accelerator such as a GPU or TPU. The most common case is where you only have a single GPU, but there is also a section about mutli-GPU and CPU training (with more coming soon).

<Tip>

 Note: Most of the strategies introduced in the single GPU sections (such as mixed precision training or gradient accumulation) are generic and apply to training models in general so make sure to have a look at it before diving into the following sections such as multi-GPU or CPU training.

</Tip>
34

35
### Single GPU
36

37
Training large models on a single GPU can be challenging but there are a number of tools and methods that make it feasible. In this section methods such as mixed precision training, gradient accumulation and checkpointing, efficient optimizers, as well as strategies to determine the best batch size are discussed.
38

39
[Go to single GPU training section](perf_train_gpu_one)
40

41
### Multi-GPU
42

43
In some cases training on a single GPU is still too slow or won't fit the large model. Moving to a mutli-GPU setup is the logical step, but training on multiple GPUs at once comes with new decisions: does each GPU have a full copy of the model or is the model itself also distributed? In this section we look at data, tensor, and pipeline parallism.
44

45
[Go to multi-GPU training section](perf_train_gpu_many)
46

47
48
49
50
51
52
### CPU


[Go to CPU training section](perf_train_cpu)


53
### TPU
54

55
[_Coming soon_](perf_train_tpu)
56

57
### Specialized Hardware
58

59
[_Coming soon_](perf_train_special)
60

61
## Inference
62

63
Efficient inference with large models in a production environment can be as challenging as training them. In the following sections we go through the steps to run inference on CPU and single/multi-GPU setups.
64

65
### CPU
66

67
[Go to CPU inference section](perf_infer_cpu)
68

69
### Single GPU
70

71
[Go to single GPU inference section](perf_infer_gpu_one)
72

73
### Multi-GPU
74

75
[Go to multi-GPU inference section](perf_infer_gpu_many)
76

77
### Specialized Hardware
Stas Bekman's avatar
Stas Bekman committed
78

79
[_Coming soon_](perf_infer_special)
Stas Bekman's avatar
Stas Bekman committed
80
81
82

## Hardware

83
In the hardware section you can find tips and tricks when building your own deep learning rig.
84

85
[Go to hardware section](perf_hardware)
86

87

Stas Bekman's avatar
Stas Bekman committed
88
89
90
91
92
## Contribute

This document is far from being complete and a lot more needs to be added, so if you have additions or corrections to make please don't hesitate to open a PR or if you aren't sure start an Issue and we can discuss the details there.

When making contributions that A is better than B, please try to include a reproducible benchmark and/or a link to the source of that information (unless it comes directly from you).