1. 09 May, 2026 1 commit
    • one's avatar
      Adapt spconv for DTK SIMT fallback · a2dd956c
      one authored
      Add a DTK-specific kernel filter path for running spconv through the
      DTK CUDA compatibility layer on BW100. The recommended `dtk_simt`
      filter keeps only SIMT kernels, forces SIMT params to static codegen,
      and removes Volta/Turing/Ampere TensorOp, int8, and NVRTC paths from
      the active kernel set.
      
      Add `dtk_tensorop` as a separate non-default adaptation entry point for
      future Ampere TensorOp work. This keeps static non-int8 Ampere TensorOp
      params while still excluding Volta/Turing, int8, and NVRTC paths.
      
      Allow fp16 workloads to use SIMT fallback when `SPCONV_DTK_KERNEL_FILTER`
      is set to `dtk_simt`. This updates both the Python tuner and generated
      C++ ConvTunerSimple logic so fp16 no longer depends on currently
      unsupported TensorOp paths on DTK.
      
      Add `SPCONV_FORCE_CUDA_ARCH` to keep runtime dispatch aligned with the
      compiled arch list, and keep the BW100 path explicit with `9.3`.
      
      Adjust DTK build/runtime compatibility:
      - reuse the guarded editable-install state during constants setup
      - skip the Linux Thrust `-fno-gnu-unique` flag under the DTK inline-PTX
        compatibility path
      - add launch bounds to helper kernels that are launched with 1024
        threads/block
      
      This leaves full TensorOp, int8, fp8, and NVRTC support out of the
      recommended DTK path. Those remain future adaptation work.
      a2dd956c
  2. 15 Dec, 2024 3 commits
  3. 10 Dec, 2024 1 commit
  4. 09 Dec, 2024 1 commit
  5. 19 Apr, 2023 1 commit
  6. 23 Mar, 2023 3 commits
  7. 02 Feb, 2023 2 commits
  8. 20 Jan, 2023 2 commits
  9. 19 Jan, 2023 2 commits
  10. 17 Jan, 2023 1 commit
  11. 10 Jan, 2023 1 commit
  12. 04 Jan, 2023 1 commit
  13. 03 Jan, 2023 1 commit
  14. 29 Dec, 2022 1 commit
  15. 27 Dec, 2022 1 commit
  16. 05 Nov, 2022 2 commits
  17. 26 Oct, 2022 1 commit
  18. 18 Oct, 2022 2 commits
  19. 28 Sep, 2022 1 commit
  20. 26 Sep, 2022 3 commits
  21. 25 Sep, 2022 5 commits
  22. 24 Sep, 2022 4 commits