-
one authored
Centralize the 1024-thread launch bound annotation for sparse index CUDA kernels and apply it consistently across index, hash table, mask, and SubM indice helper kernels. This keeps generated kernel definitions aligned with the launch configuration used by DTK runtime checks.
3610ebfa