Mention AITemplate Stable Diffusion in usage.md

25387b24 · Tri Dao · 2e33fc8e · 25387b24
Commit 25387b24 authored Nov 14, 2022 by Tri Dao
Show whitespace changes
Inline Side-by-side

Showing with 16 additions and 9 deletions

usage.md usage.md +16 -9

No files found.
--- a/usage.md
+++ b/usage.md
@@ -64,6 +64,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
  of Stable Diffusion: with FlashAttention as one of its components, it speeds up
  pretraining by up to 6.5x, and reduces the hardware cost of fine-tuning by 7x.
+- Meta's
+  [AITemplate](https://ai.facebook.com/blog/gpu-inference-engine-nvidia-amd-open-source/)
+  with FlashAttention one of the components, is currently the [fastest](https://twitter.com/bing_xu_/status/1590447334055632897) Stable
+  Diffusion inference engine that we know of.
 - Stable Diffusion inference from
  [Labml.ai](https://twitter.com/labmlai/status/1573634095732490240): 50% speedup.
@@ -84,8 +89,10 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
  language and compiler for parallel programming.
 - [xformers](https://github.com/facebookresearch/xformers): The xformers team
-  has implemented [memory-efficient attention](https://twitter.com/fvsmassa/status/1580229170629849089) in a similar spirit to FlashAttention.
+  has implemented [memory-efficient
+  attention](https://twitter.com/fvsmassa/status/1580229170629849089) in a
+  similar spirit to FlashAttention.
+  xformers dynamically dispatches to whichever implementation is available / faster.
 - [Jax](https://github.com/google/jax): an [implementation](https://github.com/lucidrains/flash-attention-jax)
  in Jax by [lucidrains](https://github.com/lucidrains/).