Unverified Commit e0c910bb authored by Thomas Parnell's avatar Thomas Parnell Committed by GitHub
Browse files

[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295)


Signed-off-by: default avatarThomas Parnell <tpa@zurich.ibm.com>
parent bf3ffb61
...@@ -245,7 +245,7 @@ def _chunk_scan_fwd_kernel( ...@@ -245,7 +245,7 @@ def _chunk_scan_fwd_kernel(
) )
if not HAS_INITSTATES and (seq_idx != seq_idx_prev): if not HAS_INITSTATES and (seq_idx != seq_idx_prev):
prev_states = tl.zeros( prev_states = tl.zeros(
(BLOCK_SIZE_DSTATE, BLOCK_SIZE_K), dtype=C_ptr.dtype.element_ty (BLOCK_SIZE_K, BLOCK_SIZE_N), dtype=C_ptr.dtype.element_ty
) )
else: else:
prev_states = tl.load( prev_states = tl.load(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment