Disable the FA backend for SDPA on AMD GPUs (#30850)

* disable fa * disable fa * update warning * update warning

Disable the FA backend for SDPA on AMD GPUs (#30850)
* disable fa * disable fa * update warning * update warning
0753134f · Mohit Sharma · GitHub · 9d889f87 · 0753134f
Unverified Commit 0753134f authored May 16, 2024 by Mohit Sharma Committed by GitHub May 16, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 0 deletions

src/transformers/modeling_utils.py src/transformers/modeling_utils.py +10 -0

No files found.
--- a/src/transformers/modeling_utils.py
+++ b/src/transformers/modeling_utils.py
@@ -1479,6 +1479,16 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
                config,
                hard_check_only=False if requested_attn_implementation is None else True,
            )
+            if (
+                torch.version.hip is not None
+                and config._attn_implementation == "sdpa"
+                and torch.cuda.device_count() > 1
+            ):
+                logger.warning_once(
+                    "Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends."
+                )
+                torch.backends.cuda.enable_flash_sdp(False)
        else:
            config._attn_implementation = "eager"