Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
23e8fa5a
"tools/infer/predict_cls.py" did not exist on "3892a8ca02d99d869cca8cc23408c7cf75ab4193"
Unverified
Commit
23e8fa5a
authored
Mar 27, 2024
by
Driss Guessous
Committed by
GitHub
Mar 27, 2024
Browse files
Add the option for the macro and note (#893)
parent
3e9414f1
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
1 deletion
+8
-1
csrc/flash_attn/src/softmax.h
csrc/flash_attn/src/softmax.h
+8
-1
No files found.
csrc/flash_attn/src/softmax.h
View file @
23e8fa5a
...
...
@@ -78,7 +78,14 @@ __forceinline__ __device__ void scale_apply_exp2(Tensor<Engine0, Layout0> &tenso
// Instead of computing exp(x - max), we compute exp2(x * log_2(e) -
// max * log_2(e)) This allows the compiler to use the ffma
// instruction instead of fadd and fmul separately.
// The following macro will disable the use of fma.
// See: https://github.com/pytorch/pytorch/issues/121558 for more details
// This macro is set in PyTorch and not FlashAttention
#ifdef UNFUSE_FMA
tensor
(
mi
,
ni
)
=
exp2f
(
__fmul_rn
(
tensor
(
mi
,
ni
),
scale
)
-
max_scaled
);
#else
tensor
(
mi
,
ni
)
=
exp2f
(
tensor
(
mi
,
ni
)
*
scale
-
max_scaled
);
#endif
}
}
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment