1. 12 May, 2026 1 commit
    • one's avatar
      Add launch bounds helper for sparse index kernels · 3610ebfa
      one authored
      Centralize the 1024-thread launch bound annotation for sparse index
      CUDA kernels and apply it consistently across index, hash table, mask,
      and SubM indice helper kernels. This keeps generated kernel definitions
      aligned with the launch configuration used by DTK runtime checks.
      3610ebfa
  2. 09 May, 2026 1 commit
    • one's avatar
      Adapt spconv for DTK SIMT fallback · a2dd956c
      one authored
      Add a DTK-specific kernel filter path for running spconv through the
      DTK CUDA compatibility layer on BW100. The recommended `dtk_simt`
      filter keeps only SIMT kernels, forces SIMT params to static codegen,
      and removes Volta/Turing/Ampere TensorOp, int8, and NVRTC paths from
      the active kernel set.
      
      Add `dtk_tensorop` as a separate non-default adaptation entry point for
      future Ampere TensorOp work. This keeps static non-int8 Ampere TensorOp
      params while still excluding Volta/Turing, int8, and NVRTC paths.
      
      Allow fp16 workloads to use SIMT fallback when `SPCONV_DTK_KERNEL_FILTER`
      is set to `dtk_simt`. This updates both the Python tuner and generated
      C++ ConvTunerSimple logic so fp16 no longer depends on currently
      unsupported TensorOp paths on DTK.
      
      Add `SPCONV_FORCE_CUDA_ARCH` to keep runtime dispatch aligned with the
      compiled arch list, and keep the BW100 path explicit with `9.3`.
      
      Adjust DTK build/runtime compatibility:
      - reuse the guarded editable-install state during constants setup
      - skip the Linux Thrust `-fno-gnu-unique` flag under the DTK inline-PTX
        compatibility path
      - add launch bounds to helper kernels that are launched with 1024
        threads/block
      
      This leaves full TensorOp, int8, fp8, and NVRTC support out of the
      recommended DTK path. Those remain future adaptation work.
      a2dd956c
  3. 15 Dec, 2024 3 commits
  4. 10 Dec, 2024 1 commit
  5. 09 Dec, 2024 1 commit
  6. 19 Apr, 2023 1 commit
  7. 23 Mar, 2023 3 commits
  8. 02 Feb, 2023 2 commits
  9. 20 Jan, 2023 2 commits
  10. 19 Jan, 2023 2 commits
  11. 17 Jan, 2023 1 commit
  12. 10 Jan, 2023 1 commit
  13. 04 Jan, 2023 1 commit
  14. 03 Jan, 2023 1 commit
  15. 29 Dec, 2022 1 commit
  16. 27 Dec, 2022 1 commit
  17. 05 Nov, 2022 2 commits
  18. 26 Oct, 2022 1 commit
  19. 18 Oct, 2022 2 commits
  20. 28 Sep, 2022 1 commit
  21. 26 Sep, 2022 3 commits
  22. 25 Sep, 2022 5 commits
  23. 24 Sep, 2022 3 commits