Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
FlashMLA
Commits
6a04965a
Commit
6a04965a
authored
Feb 27, 2026
by
zhanghj2
Browse files
去掉__syncthreads和分支
parent
ae382f02
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
6 deletions
+8
-6
csrc/extension/softmax.h
csrc/extension/softmax.h
+8
-6
No files found.
csrc/extension/softmax.h
View file @
6a04965a
...
...
@@ -55,12 +55,14 @@ __device__ __forceinline__ void warp_allreduce_(Tensor<Engine0, Layout0> &dst, T
// smem_reduce(row, col) = dst(0);
}
__syncthreads
();
if
(
tidx
<
16
)
{
smem_reduce
(
row
+
64
)
=
op
(
op
(
smem_reduce
(
row
*
4
),
smem_reduce
(
row
*
4
+
1
)),
op
(
smem_reduce
(
row
*
4
+
2
),
smem_reduce
(
row
*
4
+
3
)));
}
__syncthreads
();
dst
(
0
)
=
smem_reduce
(
row
+
64
);
// if (tidx < 16)
// {
// smem_reduce(row + 64) = op(op(smem_reduce(row * 4), smem_reduce(row * 4 + 1)), op(smem_reduce(row * 4 + 2), smem_reduce(row * 4 + 3)));
// }
// __syncthreads();
// dst(0) = smem_reduce(row + 64);
dst
(
0
)
=
op
(
op
(
smem_reduce
(
row
*
4
),
smem_reduce
(
row
*
4
+
1
)),
op
(
smem_reduce
(
row
*
4
+
2
),
smem_reduce
(
row
*
4
+
3
)));
}
template
<
typename
Engine0
,
typename
Layout0
,
typename
Engine1
,
typename
Layout1
,
typename
Operator
>
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment