"...common/git@developer.sourcefind.cn:Wenxuan/LightX2V.git" did not exist on "38c40fa07afe74dc3b6c26b260a921e0a47390c6"
perf_train_cpu.md 3.99 KB
Newer Older
1
2
3
4
5
6
7
8
9
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
11
12
13

鈿狅笍 Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

14
15
16
17
18
19
20
21
22
23
24
25
-->

# Efficient Training on CPU

This guide focuses on training large models efficiently on CPU.

## Mixed precision with IPEX

IPEX is optimized for CPUs with AVX-512 or above, and functionally works for CPUs with only AVX2. So, it is expected to bring performance benefit for Intel CPU generations with AVX-512 or above while CPUs with only AVX2 (e.g., AMD CPUs or older Intel CPUs) might result in a better performance under IPEX, but not guaranteed. IPEX provides performance optimizations for CPU training with both Float32 and BFloat16. The usage of BFloat16 is the main focus of the following sections.

Low precision data type BFloat16 has been natively supported on the 3rd Generation Xeon庐 Scalable Processors (aka Cooper Lake) with AVX512 instruction set and will be supported on the next generation of Intel庐 Xeon庐 Scalable Processors with Intel庐 Advanced Matrix Extensions (Intel庐 AMX) instruction set with further boosted performance. The Auto Mixed Precision for CPU backend has been enabled since PyTorch-1.10. At the same time, the support of Auto Mixed Precision with BFloat16 for CPU and BFloat16 optimization of operators has been massively enabled in Intel庐 Extension for PyTorch, and partially upstreamed to PyTorch master branch. Users can get better performance and user experience with IPEX Auto Mixed Precision.

26
Check more detailed information for [Auto Mixed Precision](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/features/amp.html).
27
28
29
30
31

### IPEX installation:

IPEX release is following PyTorch, to install via pip:

Wang, Yi's avatar
Wang, Yi committed
32
33
| PyTorch Version   | IPEX version   |
| :---------------: | :----------:   |
jiqing-feng's avatar
jiqing-feng committed
34
35
| 2.1.x             |  2.1.100+cpu   |
| 2.0.x             |  2.0.100+cpu   |
Wang, Yi's avatar
Wang, Yi committed
36
37
| 1.13              |  1.13.0+cpu    |
| 1.12              |  1.12.300+cpu  |
38

jiqing-feng's avatar
jiqing-feng committed
39
Please run `pip list | grep torch` to get your `pytorch_version`, so you can get the `IPEX version_name`.
40
```
Wang, Yi's avatar
Wang, Yi committed
41
pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu
42
```
jiqing-feng's avatar
jiqing-feng committed
43
You can check the latest versions in [ipex-whl-stable-cpu](https://developer.intel.com/ipex-whl-stable-cpu) if needed.
44
45

Check more approaches for [IPEX installation](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

### Usage in Trainer
To enable auto mixed precision with IPEX in Trainer, users should add `use_ipex`, `bf16` and `no_cuda` in training command arguments.

Take an example of the use cases on [Transformers question-answering](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering)

- Training with IPEX using BF16 auto mixed precision on CPU:
<pre> python run_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/ \
jiqing-feng's avatar
jiqing-feng committed
64
65
66
67
68
69
70
71
72
73
74
75
76
77
<b>--use_ipex</b> \
<b>--bf16</b> \
<b>--use_cpu</b></pre> 

If you want to enable `use_ipex` and `bf16` in your script, add these parameters to `TrainingArguments` like this:
```diff
training_args = TrainingArguments(
    output_dir=args.output_path,
+   bf16=True,
+   use_ipex=True,
+   use_cpu=True,
    **kwargs
)
```
78

79
80
81
### Practice example

Blog: [Accelerating PyTorch Transformers with Intel Sapphire Rapids](https://huggingface.co/blog/intel-sapphire-rapids)