- 09 May, 2026 1 commit
-
-
one authored
Add a DTK-specific kernel filter path for running spconv through the DTK CUDA compatibility layer on BW100. The recommended `dtk_simt` filter keeps only SIMT kernels, forces SIMT params to static codegen, and removes Volta/Turing/Ampere TensorOp, int8, and NVRTC paths from the active kernel set. Add `dtk_tensorop` as a separate non-default adaptation entry point for future Ampere TensorOp work. This keeps static non-int8 Ampere TensorOp params while still excluding Volta/Turing, int8, and NVRTC paths. Allow fp16 workloads to use SIMT fallback when `SPCONV_DTK_KERNEL_FILTER` is set to `dtk_simt`. This updates both the Python tuner and generated C++ ConvTunerSimple logic so fp16 no longer depends on currently unsupported TensorOp paths on DTK. Add `SPCONV_FORCE_CUDA_ARCH` to keep runtime dispatch aligned with the compiled arch list, and keep the BW100 path explicit with `9.3`. Adjust DTK build/runtime compatibility: - reuse the guarded editable-install state during constants setup - skip the Linux Thrust `-fno-gnu-unique` flag under the DTK inline-PTX compatibility path - add launch bounds to helper kernels that are launched with 1024 threads/block This leaves full TensorOp, int8, fp8, and NVRTC support out of the recommended DTK path. Those remain future adaptation work.
-
- 15 Dec, 2024 3 commits
- 10 Dec, 2024 1 commit
-
-
yan.yan authored
-
- 09 Dec, 2024 1 commit
-
-
yan.yan authored
-
- 19 Apr, 2023 1 commit
-
-
yan.yan authored
-
- 23 Mar, 2023 3 commits
- 02 Feb, 2023 2 commits
- 20 Jan, 2023 2 commits
- 19 Jan, 2023 2 commits
- 17 Jan, 2023 1 commit
-
-
yan.yan authored
-
- 10 Jan, 2023 1 commit
-
-
yan.yan authored
-
- 04 Jan, 2023 1 commit
-
-
yan.yan authored
-
- 03 Jan, 2023 1 commit
-
-
yan.yan authored
-
- 29 Dec, 2022 1 commit
-
-
yan.yan authored
-
- 27 Dec, 2022 1 commit
-
-
FindDefinition authored
* large kernel bwd&bwdI, not test increment RS * large kernel fix, no split_mask and increment rs * large kernel fix2, no split_mask and increment rs * reset benchmark.py * fix merge Co-authored-by:EvernightAurora <2465542858@qq.com>
-
- 05 Nov, 2022 2 commits
- 26 Oct, 2022 1 commit
-
-
yan.yan authored
-
- 18 Oct, 2022 2 commits
- 28 Sep, 2022 1 commit
-
-
yan.yan authored
-
- 26 Sep, 2022 3 commits
- 25 Sep, 2022 5 commits
- 24 Sep, 2022 4 commits