Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel_ROCM
Commits
4f19d493
Commit
4f19d493
authored
Jun 06, 2024
by
Adam Osewski
Browse files
Remove synchronization barriers which are unnecessary.
parent
6ebaa81e
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
0 additions
and
10 deletions
+0
-10
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_splitk_xdl_cshuffle_tile_loop.hpp
...grouped_gemm_multiple_d_splitk_xdl_cshuffle_tile_loop.hpp
+0
-10
No files found.
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_splitk_xdl_cshuffle_tile_loop.hpp
View file @
4f19d493
...
...
@@ -159,16 +159,6 @@ __global__ void
// Would be enough to keep it in registers and during AccumulatePartials
// do CShuffle in flight with loading partials products of other peer workgroups.
GridwiseGemm
::
StorePartials
(
p_workspace
,
static_cast
<
void
*>
(
p_shared
),
results_buffer
);
#if 1
__builtin_amdgcn_sched_barrier
(
0
);
// make sure all writes to gmem has finished.
__builtin_amdgcn_s_waitcnt
(
0x0f70
);
// s_waitcnt vmcnt(0)
// __builtin_amdgcn_s_waitcnt(0x0070); // s_waitcnt vmcnt(0) lgkmcnt(0)
__builtin_amdgcn_s_barrier
();
// s_barrier
// __builtin_amdgcn_sched_barrier(0x0001); // allow all non-memory instructions to pass
__builtin_amdgcn_sched_barrier
(
0
);
#endif
work_scheduler
.
FlagFinished
();
// The workgroup which processed first K tile accumulates results and stores to GMEM
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment