Update roadmap

40a25c8e · Tri Dao · eff9fe6b · 40a25c8e
Commit 40a25c8e authored May 17, 2023 by Tri Dao
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 4 deletions

README.md README.md +4 -4

No files found.
--- a/README.md
+++ b/README.md
@@ -37,6 +37,10 @@ As Triton is a higher-level language than CUDA, it might be easier to understand
 and experiment with. The notations in the Triton implementation are also closer
 to what's used in our paper.
+We also have an experimental implementation in Triton that support attention
+bias (e.g. ALiBi):
+https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/flash_attn_triton.py
 ## Installation and features
@@ -76,10 +80,6 @@ Our tentative roadmap:
 6. ~~[Jul 2022] Support head dimension 128~~[Done].
 7. ~~[Aug 2022] Fuse rotary embedding~~[Done].
 8. ~~[Mar 2023] Support SM90 GPUs (H100)~~[Done].
-9. [Apr 2023] Refactor to use Cutlass 3.x.
-10. [May 2023] Support attention bias (e.g. ALiBi, relative positional encoding).
-11. [Jun 2023] Support SM70 GPUs (V100).
-12. [Jun 2023] Support fp8 (H100).
 ## How to use FlashAttention