Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
23e8fa5a
Unverified
Commit
23e8fa5a
authored
Mar 27, 2024
by
Driss Guessous
Committed by
GitHub
Mar 27, 2024
Browse files
Add the option for the macro and note (#893)
parent
3e9414f1
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
1 deletion
+8
-1
csrc/flash_attn/src/softmax.h
csrc/flash_attn/src/softmax.h
+8
-1
No files found.
csrc/flash_attn/src/softmax.h
View file @
23e8fa5a
...
...
@@ -78,7 +78,14 @@ __forceinline__ __device__ void scale_apply_exp2(Tensor<Engine0, Layout0> &tenso
// Instead of computing exp(x - max), we compute exp2(x * log_2(e) -
// max * log_2(e)) This allows the compiler to use the ffma
// instruction instead of fadd and fmul separately.
// The following macro will disable the use of fma.
// See: https://github.com/pytorch/pytorch/issues/121558 for more details
// This macro is set in PyTorch and not FlashAttention
#ifdef UNFUSE_FMA
tensor
(
mi
,
ni
)
=
exp2f
(
__fmul_rn
(
tensor
(
mi
,
ni
),
scale
)
-
max_scaled
);
#else
tensor
(
mi
,
ni
)
=
exp2f
(
tensor
(
mi
,
ni
)
*
scale
-
max_scaled
);
#endif
}
}
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment