[docs] Attention checks (#12486)

* checks * feedback --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

[docs] Attention checks (#12486)
* checks * feedback --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
26475082 · Steven Liu · GitHub · f072c64b · 26475082
Unverified Commit 26475082 authored Oct 16, 2025 by Steven Liu Committed by GitHub Oct 16, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 39 additions and 0 deletions

docs/source/en/optimization/attention_backends.md docs/source/en/optimization/attention_backends.md +39 -0

No files found.
--- a/docs/source/en/optimization/attention_backends.md
+++ b/docs/source/en/optimization/attention_backends.md
@@ -81,6 +81,45 @@ with attention_backend("_flash_3_hub"):
 > [!TIP]
 > Most attention backends support `torch.compile` without graph breaks and can be used to further speed up inference.
+## Checks
+The attention dispatcher includes debugging checks that catch common errors before they cause problems.
+1. Device checks verify that query, key, and value tensors live on the same device.
+2. Data type checks confirm tensors have matching dtypes and use either bfloat16 or float16.
+3. Shape checks validate tensor dimensions and prevent mixing attention masks with causal flags.
+Enable these checks by setting the `DIFFUSERS_ATTN_CHECKS` environment variable. Checks add overhead to every attention operation, so they're disabled by default. 
+```bash
+export DIFFUSERS_ATTN_CHECKS=yes
+```
+The checks are run now before every attention operation.
+```py
+import torch
+query = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
+key = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
+value = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
+try:
+    with attention_backend("flash"):
+        output = dispatch_attention_fn(query, key, value)
+        print("✓ Flash Attention works with checks enabled")
+except Exception as e:
+    print(f"✗ Flash Attention failed: {e}")
+```
+You can also configure the registry directly.
+```py
+from diffusers.models.attention_dispatch import _AttentionBackendRegistry
+_AttentionBackendRegistry._checks_enabled = True
+```
 ## Available backends
 Refer to the table below for a complete list of available attention backends and their variants.