Unverified Commit 26475082 authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] Attention checks (#12486)



* checks

* feedback

---------
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
parent f072c64b
...@@ -81,6 +81,45 @@ with attention_backend("_flash_3_hub"): ...@@ -81,6 +81,45 @@ with attention_backend("_flash_3_hub"):
> [!TIP] > [!TIP]
> Most attention backends support `torch.compile` without graph breaks and can be used to further speed up inference. > Most attention backends support `torch.compile` without graph breaks and can be used to further speed up inference.
## Checks
The attention dispatcher includes debugging checks that catch common errors before they cause problems.
1. Device checks verify that query, key, and value tensors live on the same device.
2. Data type checks confirm tensors have matching dtypes and use either bfloat16 or float16.
3. Shape checks validate tensor dimensions and prevent mixing attention masks with causal flags.
Enable these checks by setting the `DIFFUSERS_ATTN_CHECKS` environment variable. Checks add overhead to every attention operation, so they're disabled by default.
```bash
export DIFFUSERS_ATTN_CHECKS=yes
```
The checks are run now before every attention operation.
```py
import torch
query = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
key = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
value = torch.randn(1, 10, 8, 64, dtype=torch.bfloat16, device="cuda")
try:
with attention_backend("flash"):
output = dispatch_attention_fn(query, key, value)
print("✓ Flash Attention works with checks enabled")
except Exception as e:
print(f"✗ Flash Attention failed: {e}")
```
You can also configure the registry directly.
```py
from diffusers.models.attention_dispatch import _AttentionBackendRegistry
_AttentionBackendRegistry._checks_enabled = True
```
## Available backends ## Available backends
Refer to the table below for a complete list of available attention backends and their variants. Refer to the table below for a complete list of available attention backends and their variants.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment