- Our own Stable Diffusion [fork](https://twitter.com/realDanFu/status/1580641495991754752) uses FlashAttention to get 3-4x speedup compared
to the original version.
## Other models
-[Uni-Fold](https://github.com/dptech-corp/Uni-Fold): Uni-Fold is an
...
...
@@ -82,10 +87,12 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
-[Triton](https://github.com/openai/triton): an [implementation](https://github.com/openai/triton/blob/master/python/tutorials/06-fused-attention.py) of
FlashAttention in Triton by Phil Tillet from OpenAI. Triton is a Python-based
language and compiler for parallel programming.
-[xformers](https://github.com/facebookresearch/xformers): The xformers team
has implemented [memory-efficient attention](https://twitter.com/fvsmassa/status/1580229170629849089) in a similar spirit to FlashAttention.
has implemented [memory-efficient
attention](https://twitter.com/fvsmassa/status/1580229170629849089) in a
similar spirit to FlashAttention.
xformers dynamically dispatches to whichever implementation is available / faster.
-[Jax](https://github.com/google/jax): an [implementation](https://github.com/lucidrains/flash-attention-jax)
in Jax by [lucidrains](https://github.com/lucidrains/).