Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
a84d0728
Commit
a84d0728
authored
Dec 05, 2022
by
Tri Dao
Browse files
[Docs] Mention FasterTransformer integration
parent
4a6eaa9f
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
0 deletions
+5
-0
usage.md
usage.md
+5
-0
No files found.
usage.md
View file @
a84d0728
...
...
@@ -50,6 +50,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
uses FlashAttention as part of their approach to speed up Transformer
inference (up to 5.3x on BERT).
-
Nvidia's
[
FasterTransformer
](
https://github.com/NVIDIA/FasterTransformer
)
is a
state-of-the-art Transformer inference library. As of version
[
5.2
](
https://github.com/NVIDIA/FasterTransformer/commit/b672f49e256ba7a2d4fc9691d270b60b7fc1a2ff
)
,
FlashAttention is used as a component of FasterTransformer to speed up GPT inference.
-
[
Kernl
](
https://github.com/ELS-RD/kernl
)
is a library for fast Transformer
inference. They use FlashAttention as part of their
[
approach
](
https://twitter.com/pommedeterre33/status/1585284221014245377
)
to
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment