Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
25387b24
Commit
25387b24
authored
Nov 14, 2022
by
Tri Dao
Browse files
Mention AITemplate Stable Diffusion in usage.md
parent
2e33fc8e
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
16 additions
and
9 deletions
+16
-9
usage.md
usage.md
+16
-9
No files found.
usage.md
View file @
25387b24
...
@@ -64,6 +64,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
...
@@ -64,6 +64,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
of Stable Diffusion: with FlashAttention as one of its components, it speeds up
of Stable Diffusion: with FlashAttention as one of its components, it speeds up
pretraining by up to 6.5x, and reduces the hardware cost of fine-tuning by 7x.
pretraining by up to 6.5x, and reduces the hardware cost of fine-tuning by 7x.
-
Meta's
[
AITemplate
](
https://ai.facebook.com/blog/gpu-inference-engine-nvidia-amd-open-source/
)
with FlashAttention one of the components, is currently the
[
fastest
](
https://twitter.com/bing_xu_/status/1590447334055632897
)
Stable
Diffusion inference engine that we know of.
-
Stable Diffusion inference from
-
Stable Diffusion inference from
[
Labml.ai
](
https://twitter.com/labmlai/status/1573634095732490240
)
: 50% speedup.
[
Labml.ai
](
https://twitter.com/labmlai/status/1573634095732490240
)
: 50% speedup.
...
@@ -84,8 +89,10 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
...
@@ -84,8 +89,10 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
language and compiler for parallel programming.
language and compiler for parallel programming.
-
[
xformers
](
https://github.com/facebookresearch/xformers
)
: The xformers team
-
[
xformers
](
https://github.com/facebookresearch/xformers
)
: The xformers team
has implemented
[
memory-efficient attention
](
https://twitter.com/fvsmassa/status/1580229170629849089
)
in a similar spirit to FlashAttention.
has implemented
[
memory-efficient
attention
](
https://twitter.com/fvsmassa/status/1580229170629849089
)
in a
similar spirit to FlashAttention.
xformers dynamically dispatches to whichever implementation is available / faster.
-
[
Jax
](
https://github.com/google/jax
)
: an
[
implementation
](
https://github.com/lucidrains/flash-attention-jax
)
-
[
Jax
](
https://github.com/google/jax
)
: an
[
implementation
](
https://github.com/lucidrains/flash-attention-jax
)
in Jax by
[
lucidrains
](
https://github.com/lucidrains/
)
.
in Jax by
[
lucidrains
](
https://github.com/lucidrains/
)
.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment