Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
4577151f
"scripts/convert_vq_diffusion_to_diffusers.py" did not exist on "269109dbfbbdbe2800535239b881e96e1828a0ef"
Commit
4577151f
authored
Jul 11, 2022
by
Tri Dao
Browse files
Link to Triton implementation
parent
bc2c2102
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
10 additions
and
0 deletions
+10
-0
README.md
README.md
+10
-0
No files found.
README.md
View file @
4577151f
...
@@ -8,6 +8,16 @@ Paper: https://arxiv.org/abs/2205.14135
...
@@ -8,6 +8,16 @@ Paper: https://arxiv.org/abs/2205.14135
IEEE Spectrum
[
article
](
https://spectrum.ieee.org/mlperf-rankings-2022
)
about our submission to the MLPerf 2.0 benchmark using FlashAttention.
IEEE Spectrum
[
article
](
https://spectrum.ieee.org/mlperf-rankings-2022
)
about our submission to the MLPerf 2.0 benchmark using FlashAttention.


#### Triton implementation of FlashAttention
Phil Tillet (OpenAI) has an implementation of FlashAttention in Triton:
https://github.com/openai/triton/blob/master/python/tutorials/06-fused-attention.py
As Triton is a higher-level language than CUDA, it might be easier to understand
and experiment with. The notations in the Triton implementation are also closer
to what's used in our paper.
## Alpha release (0.1).
## Alpha release (0.1).
To compile (requiring CUDA 11, NVCC, and an Turing or Ampere GPU):
To compile (requiring CUDA 11, NVCC, and an Turing or Ampere GPU):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment