• one's avatar
    Add launch bounds helper for sparse index kernels · 3610ebfa
    one authored
    Centralize the 1024-thread launch bound annotation for sparse index
    CUDA kernels and apply it consistently across index, hash table, mask,
    and SubM indice helper kernels. This keeps generated kernel definitions
    aligned with the launch configuration used by DTK runtime checks.
    3610ebfa
indices.py 81.1 KB