"vscode:/vscode.git/clone" did not exist on "462956be7b057ba1d156e9405289c39db56106bb"
Commit 2d5b2483 authored by Dan Fu's avatar Dan Fu
Browse files

Speedup graph for A100, d128

parent 5d07483b
...@@ -71,6 +71,14 @@ Memory savings are proportional to sequence length -- since standard attention h ...@@ -71,6 +71,14 @@ Memory savings are proportional to sequence length -- since standard attention h
We see 10X memory savings at sequence length 2K, and 20X at 4K. We see 10X memory savings at sequence length 2K, and 20X at 4K.
As a result, FlashAttention can scale to much longer sequence lengths. As a result, FlashAttention can scale to much longer sequence lengths.
#### Head Dimension 128
![FlashAttention speedup, head dimension 128](assets/flashattn_speedup_a100_d128.jpg)
We show speedup with head dimension 128.
Here we show batch size 16 with 12 heads.
Speedup is less than with the smaller head sizes, but speedup is still significant -- especially with a causal mask.
### RTX 3090 ### RTX 3090
For the RTX 3090, we use batch size 12 with 12 attention heads. For the RTX 3090, we use batch size 12 with 12 attention heads.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment