Unverified Commit 12286612 authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

[bf16 support] tweaks (#14580)



* [bf16 support] tweaks

* corrections
Co-authored-by: default avatarManuel R. Ciosici <manuelrciosici@gmail.com>
parent 16870d11
...@@ -303,7 +303,7 @@ In 🤗 Transformers the full fp16 inference is enabled by passing `--fp16_full_ ...@@ -303,7 +303,7 @@ In 🤗 Transformers the full fp16 inference is enabled by passing `--fp16_full_
#### bf16 #### bf16
If you own the new Ampere hardware you can start using bf16 for your training and evaluation. While bf16 has a worse precision than fp16, it has a much much bigger dynamic range. Therefore, if in the past you were experiencing overflow issues while training the model, bf16 will prevent this from happening most of the time. Remember that in fp16 the biggest number you can have is `65535` and any number above that will overflow. a bf16 number can be as large as `3.39e+38` (!) which is about the same as fp32 - because both have 8-bits used for the numerical range. If you own Ampere or newer hardware you can start using bf16 for your training and evaluation. While bf16 has a worse precision than fp16, it has a much much bigger dynamic range. Therefore, if in the past you were experiencing overflow issues while training the model, bf16 will prevent this from happening most of the time. Remember that in fp16 the biggest number you can have is `65535` and any number above that will overflow. A bf16 number can be as large as `3.39e+38` (!) which is about the same as fp32 - because both have 8-bits used for the numerical range.
Automatic Mixed Precision (AMP) is the same as with fp16, except it'll use bf16. Automatic Mixed Precision (AMP) is the same as with fp16, except it'll use bf16.
...@@ -311,7 +311,7 @@ Thanks to the fp32-like dynamic range with bf16 mixed precision loss scaling is ...@@ -311,7 +311,7 @@ Thanks to the fp32-like dynamic range with bf16 mixed precision loss scaling is
If you have tried to finetune models pre-trained under bf16 mixed precision (e.g. T5) it's very likely that you have encountered overflow issues. Now you should be able to finetune those models without any issues. If you have tried to finetune models pre-trained under bf16 mixed precision (e.g. T5) it's very likely that you have encountered overflow issues. Now you should be able to finetune those models without any issues.
That's said also be aware that if you pre-trained a model in bf16, it's likely to have overflow issues if someone tries to finetune it in fp16 down the road. So once started on the bf16-mode path it's best to remain on it and not switch to fp16. That said, also be aware that if you pre-trained a model in bf16, it's likely to have overflow issues if someone tries to finetune it in fp16 down the road. So once started on the bf16-mode path it's best to remain on it and not switch to fp16.
In 🤗 Transformers bf16 mixed precision is enabled by passing `--bf16` to the 🤗 Trainer. In 🤗 Transformers bf16 mixed precision is enabled by passing `--bf16` to the 🤗 Trainer.
...@@ -345,7 +345,7 @@ In 🤗 Transformers the full bf16 inference is enabled by passing `--bf16_full_ ...@@ -345,7 +345,7 @@ In 🤗 Transformers the full bf16 inference is enabled by passing `--bf16_full_
The Ampere hardware uses a magical data type called tf32. It has the same numerical range as fp32 (8-bits), but instead of 23 bits precision it has only 10 bits (same as fp16). In total it uses only 19 bits. The Ampere hardware uses a magical data type called tf32. It has the same numerical range as fp32 (8-bits), but instead of 23 bits precision it has only 10 bits (same as fp16). In total it uses only 19 bits.
It's magical in a sense that you can use the normal fp32 training and/or inference code and by enabling tf32 support you can get up to 3x throughput improvement. All you need to do is to add this to your code: It's magical in the sense that you can use the normal fp32 training and/or inference code and by enabling tf32 support you can get up to 3x throughput improvement. All you need to do is to add this to your code:
``` ```
import torch import torch
......
...@@ -29,6 +29,7 @@ from .file_utils import ( ...@@ -29,6 +29,7 @@ from .file_utils import (
is_sagemaker_dp_enabled, is_sagemaker_dp_enabled,
is_sagemaker_mp_enabled, is_sagemaker_mp_enabled,
is_torch_available, is_torch_available,
is_torch_bf16_available,
is_torch_tf32_available, is_torch_tf32_available,
is_torch_tpu_available, is_torch_tpu_available,
torch_required, torch_required,
...@@ -794,6 +795,9 @@ class TrainingArguments: ...@@ -794,6 +795,9 @@ class TrainingArguments:
) )
self.half_precision_backend = self.fp16_backend self.half_precision_backend = self.fp16_backend
if (self.bf16 or self.bf16_full_eval) and not is_torch_bf16_available():
raise ValueError("Your setup doesn't support bf16. You need Ampere GPU, torch>=1.10, cuda>=11.0")
if self.fp16 and self.bf16: if self.fp16 and self.bf16:
raise ValueError("At most one of fp16 and bf16 can be True, but not both") raise ValueError("At most one of fp16 and bf16 can be True, but not both")
if self.bf16: if self.bf16:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment