Commit 4decc3c1 authored by Dan Fu's avatar Dan Fu
Browse files

README typo

parent dc6d1300
...@@ -35,6 +35,7 @@ We display FlashAttention speedup using these parameters (similar to BERT-base): ...@@ -35,6 +35,7 @@ We display FlashAttention speedup using these parameters (similar to BERT-base):
* Batch size 8 * Batch size 8
* Head dimension 64 * Head dimension 64
* 12 attention heads * 12 attention heads
Our graphs show sequence lengths between 128 and 4096 (when standard attention runs out of memory on an A100), but FlashAttention can scale up to sequence length 64K. Our graphs show sequence lengths between 128 and 4096 (when standard attention runs out of memory on an A100), but FlashAttention can scale up to sequence length 64K.
#### Speedup #### Speedup
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment