Unverified Commit 4f364c8e authored by Kirthi Shankar Sivamani's avatar Kirthi Shankar Sivamani Committed by GitHub
Browse files

Fix out of bound ID passed to `cutlass::arch::NamedBarrier::sync` (#2554)



Fix barrier ID
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
parent c988548f
...@@ -983,7 +983,9 @@ __launch_bounds__(512, 1) __global__ static void group_row_col_rht_gemm_device( ...@@ -983,7 +983,9 @@ __launch_bounds__(512, 1) __global__ static void group_row_col_rht_gemm_device(
Tensor tQAgSFA = thr_r2g_SFA.partition_S(gSFA_mn); Tensor tQAgSFA = thr_r2g_SFA.partition_S(gSFA_mn);
Tensor tQArSFA = make_tensor_like(tQAgSFA(_, _, _, _0{}, _0{})); Tensor tQArSFA = make_tensor_like(tQAgSFA(_, _, _, _0{}, _0{}));
int row_quant_barrier_id = 10; // Will result in barrier_id=10 passed to bar.sync instr as cutlass adds 8
// in order to go over the reserved named barrier count.
constexpr int row_quant_barrier_id = 2;
cutlass::arch::NamedBarrier::sync(NumEpilogueRowQuantThreadCount, row_quant_barrier_id); cutlass::arch::NamedBarrier::sync(NumEpilogueRowQuantThreadCount, row_quant_barrier_id);
int group_idx = GetGroupIdx(&args, scheduler.tile_n_base() * size<1>(epilogue_tiler)); int group_idx = GetGroupIdx(&args, scheduler.tile_n_base() * size<1>(epilogue_tiler));
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment