Commit a2dd956c authored by one's avatar one
Browse files

Adapt spconv for DTK SIMT fallback

Add a DTK-specific kernel filter path for running spconv through the
DTK CUDA compatibility layer on BW100. The recommended `dtk_simt`
filter keeps only SIMT kernels, forces SIMT params to static codegen,
and removes Volta/Turing/Ampere TensorOp, int8, and NVRTC paths from
the active kernel set.

Add `dtk_tensorop` as a separate non-default adaptation entry point for
future Ampere TensorOp work. This keeps static non-int8 Ampere TensorOp
params while still excluding Volta/Turing, int8, and NVRTC paths.

Allow fp16 workloads to use SIMT fallback when `SPCONV_DTK_KERNEL_FILTER`
is set to `dtk_simt`. This updates both the Python tuner and generated
C++ ConvTunerSimple logic so fp16 no longer depends on currently
unsupported TensorOp paths on DTK.

Add `SPCONV_FORCE_CUDA_ARCH` to keep runtime dispatch aligned with the
compiled arch list, and keep the BW100 path explicit with `9.3`.

Adjust DTK build/runtime compatibility:
- reuse the guarded editable-install state during constants setup
- skip the Linux Thrust `-fno-gnu-unique` flag under the DTK inline-PTX
  compatibility path
- add launch bounds to helper kernels that are launched with 1024
  threads/block

This leaves full TensorOp, int8, fp8, and NVRTC support out of the
recommended DTK path. Those remain future adaptation work.
parent 263d6b47
Pipeline #3585 failed with stages
in 0 seconds