Commit 765741c1 authored by Dan Fu's avatar Dan Fu
Browse files

More explanation

parent 2d5b2483
...@@ -77,7 +77,8 @@ As a result, FlashAttention can scale to much longer sequence lengths. ...@@ -77,7 +77,8 @@ As a result, FlashAttention can scale to much longer sequence lengths.
We show speedup with head dimension 128. We show speedup with head dimension 128.
Here we show batch size 16 with 12 heads. Here we show batch size 16 with 12 heads.
Speedup is less than with the smaller head sizes, but speedup is still significant -- especially with a causal mask. Speedup is less than with the smaller head sizes, since we have to make the block size smaller in the tiling.
But speedup is still significant, especially with a causal mask.
### RTX 3090 ### RTX 3090
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment