Add launch bounds helper for sparse index kernels
Centralize the 1024-thread launch bound annotation for sparse index CUDA kernels and apply it consistently across index, hash table, mask, and SubM indice helper kernels. This keeps generated kernel definitions aligned with the launch configuration used by DTK runtime checks.
Showing
Please register or sign in to comment