Adapt spconv for DTK SIMT fallback
Add a DTK-specific kernel filter path for running spconv through the DTK CUDA compatibility layer on BW100. The recommended `dtk_simt` filter keeps only SIMT kernels, forces SIMT params to static codegen, and removes Volta/Turing/Ampere TensorOp, int8, and NVRTC paths from the active kernel set. Add `dtk_tensorop` as a separate non-default adaptation entry point for future Ampere TensorOp work. This keeps static non-int8 Ampere TensorOp params while still excluding Volta/Turing, int8, and NVRTC paths. Allow fp16 workloads to use SIMT fallback when `SPCONV_DTK_KERNEL_FILTER` is set to `dtk_simt`. This updates both the Python tuner and generated C++ ConvTunerSimple logic so fp16 no longer depends on currently unsupported TensorOp paths on DTK. Add `SPCONV_FORCE_CUDA_ARCH` to keep runtime dispatch aligned with the compiled arch list, and keep the BW100 path explicit with `9.3`. Adjust DTK build/runtime compatibility: - reuse the guarded editable-install state during constants setup - skip the Linux Thrust `-fno-gnu-unique` flag under the DTK inline-PTX compatibility path - add launch bounds to helper kernels that are launched with 1024 threads/block This leaves full TensorOp, int8, fp8, and NVRTC support out of the recommended DTK path. Those remain future adaptation work.
| Name | Stage | Failure | ||
|---|---|---|---|---|
|
failed
|
test | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
build | Build | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
code_quality | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||