[bf16 support] tweaks (#14580)

* [bf16 support] tweaks * corrections Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>

[bf16 support] tweaks (#14580)
* [bf16 support] tweaks * corrections Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>
12286612 · Stas Bekman · GitHub · 16870d11 · 12286612 · 12286612
Unverified Commit 12286612 authored Dec 08, 2021 by Stas Bekman Committed by GitHub Dec 08, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 3 deletions

docs/source/performance.md docs/source/performance.md +3 -3

src/transformers/training_args.py src/transformers/training_args.py +4 -0

No files found.
--- a/docs/source/performance.md
+++ b/docs/source/performance.md
@@ -303,7 +303,7 @@ In 🤗 Transformers the full fp16 inference is enabled by passing `--fp16_full_
 #### bf16
-If you own the new Ampere hardware you can start using bf16 for your training and evaluation. While bf16 has a worse precision than fp16, it has a much much bigger dynamic range. Therefore, if in the past you were experiencing overflow issues while training the model, bf16 will prevent this from happening most of the time. Remember that in fp16 the biggest number you can have is `65535` and any number above that will overflow. a bf16 number can be as large as `3.39e+38` (!) which is about the same as fp32 - because both have 8-bits used for the numerical range.
+If you own Ampere or newer hardware you can start using bf16 for your training and evaluation. While bf16 has a worse precision than fp16, it has a much much bigger dynamic range. Therefore, if in the past you were experiencing overflow issues while training the model, bf16 will prevent this from happening most of the time. Remember that in fp16 the biggest number you can have is `65535` and any number above that will overflow. A bf16 number can be as large as `3.39e+38` (!) which is about the same as fp32 - because both have 8-bits used for the numerical range.
 Automatic Mixed Precision (AMP) is the same as with fp16, except it'll use bf16.
@@ -311,7 +311,7 @@ Thanks to the fp32-like dynamic range with bf16 mixed precision loss scaling is
 If you have tried to finetune models pre-trained under bf16 mixed precision (e.g. T5) it's very likely that you have encountered overflow issues. Now you should be able to finetune those models without any issues.
-That's said also be aware that if you pre-trained a model in bf16, it's likely to have overflow issues if someone tries to finetune it in fp16 down the road. So once started on the bf16-mode path it's best to remain on it and not switch to fp16.
+That said, also be aware that if you pre-trained a model in bf16, it's likely to have overflow issues if someone tries to finetune it in fp16 down the road. So once started on the bf16-mode path it's best to remain on it and not switch to fp16.
 In 🤗 Transformers bf16 mixed precision is enabled by passing `--bf16` to the 🤗 Trainer.
@@ -345,7 +345,7 @@ In 🤗 Transformers the full bf16 inference is enabled by passing `--bf16_full_
 The Ampere hardware uses a magical data type called tf32. It has the same numerical range as fp32 (8-bits), but instead of 23 bits precision it has only 10 bits (same as fp16). In total it uses only 19 bits.
-It's magical in a sense that you can use the normal fp32 training and/or inference code and by enabling tf32 support you can get up to 3x throughput improvement. All you need to do is to add this to your code:
+It's magical in the sense that you can use the normal fp32 training and/or inference code and by enabling tf32 support you can get up to 3x throughput improvement. All you need to do is to add this to your code:
 ```
 import torch

--- a/src/transformers/training_args.py
+++ b/src/transformers/training_args.py
@@ -29,6 +29,7 @@ from .file_utils import (
    is_sagemaker_dp_enabled,
    is_sagemaker_mp_enabled,
    is_torch_available,
+    is_torch_bf16_available,
    is_torch_tf32_available,
    is_torch_tpu_available,
    torch_required,
@@ -794,6 +795,9 @@ class TrainingArguments:
            )
            self.half_precision_backend = self.fp16_backend
+        if (self.bf16 or self.bf16_full_eval) and not is_torch_bf16_available():
+            raise ValueError("Your setup doesn't support bf16. You need Ampere GPU, torch>=1.10, cuda>=11.0")
        if self.fp16 and self.bf16:
            raise ValueError("At most one of fp16 and bf16 can be True, but not both")
        if self.bf16: