Or make sure your model is correctly casted in float16 or bfloat16
Have a look at [this detailed blogpost](https://pytorch.org/blog/out-of-the-box-acceleration/) to read more about what is possible to do with `BetterTransformer` + SDPA API.
...
...
@@ -270,4 +272,4 @@ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable