Unverified Commit 5b19b930 authored by Gregory Shtrasberg's avatar Gregory Shtrasberg Committed by GitHub
Browse files

[ROCm][Kernel] Using the correct warp_size value

parent 75404d04
...@@ -207,8 +207,8 @@ __global__ void sgl_moe_align_block_size_kernel( ...@@ -207,8 +207,8 @@ __global__ void sgl_moe_align_block_size_kernel(
__shared__ int32_t shared_counts[32][8]; __shared__ int32_t shared_counts[32][8];
__shared__ int32_t local_offsets[256]; __shared__ int32_t local_offsets[256];
const int warp_id = threadIdx.x / WARP_SIZE; const int warp_id = threadIdx.x / 32;
const int lane_id = threadIdx.x % WARP_SIZE; const int lane_id = threadIdx.x % 32;
const int experts_per_warp = 8; const int experts_per_warp = 8;
const int my_expert_start = warp_id * experts_per_warp; const int my_expert_start = warp_id * experts_per_warp;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment