• one's avatar
    Adapt spconv for DTK SIMT fallback · a2dd956c
    one authored
    Add a DTK-specific kernel filter path for running spconv through the
    DTK CUDA compatibility layer on BW100. The recommended `dtk_simt`
    filter keeps only SIMT kernels, forces SIMT params to static codegen,
    and removes Volta/Turing/Ampere TensorOp, int8, and NVRTC paths from
    the active kernel set.
    
    Add `dtk_tensorop` as a separate non-default adaptation entry point for
    future Ampere TensorOp work. This keeps static non-int8 Ampere TensorOp
    params while still excluding Volta/Turing, int8, and NVRTC paths.
    
    Allow fp16 workloads to use SIMT fallback when `SPCONV_DTK_KERNEL_FILTER`
    is set to `dtk_simt`. This updates both the Python tuner and generated
    C++ ConvTunerSimple logic so fp16 no longer depends on currently
    unsupported TensorOp paths on DTK.
    
    Add `SPCONV_FORCE_CUDA_ARCH` to keep runtime dispatch aligned with the
    compiled arch list, and keep the BW100 path explicit with `9.3`.
    
    Adjust DTK build/runtime compatibility:
    - reuse the guarded editable-install state during constants setup
    - skip the Linux Thrust `-fno-gnu-unique` flag under the DTK inline-PTX
      compatibility path
    - add launch bounds to helper kernels that are launched with 1024
      threads/block
    
    This leaves full TensorOp, int8, fp8, and NVRTC support out of the
    recommended DTK path. Those remain future adaptation work.
    a2dd956c
convops.py 94.5 KB