Commits · 813bf014440fad94132dc07714f3210291152ef4 · one / spconv

08 Jun, 2026 1 commit

spconv: filter DTK MaskImplicitGemm descriptors · 813bf014

one authored May 14, 2026

Limit the DTK SIMT MaskImplicitGemm forward descriptor set for the
MV2DFusion VVM/SubM fp32 shape family on sm_93.

For kv=27, input channels=128, and output channels=64, keep only the
SIMT descriptor validated by dense-oracle replay. Apply the same
selection rule in both the Python tuner path and generated C++ tuner
path to keep build-time and runtime behavior aligned.

813bf014

12 May, 2026 1 commit

Add launch bounds helper for sparse index kernels · 3610ebfa

one authored May 12, 2026

Centralize the 1024-thread launch bound annotation for sparse index
CUDA kernels and apply it consistently across index, hash table, mask,
and SubM indice helper kernels. This keeps generated kernel definitions
aligned with the launch configuration used by DTK runtime checks.

3610ebfa

09 May, 2026 1 commit

Adapt spconv for DTK SIMT fallback · a2dd956c

one authored May 09, 2026

Add a DTK-specific kernel filter path for running spconv through the
DTK CUDA compatibility layer on BW100. The recommended `dtk_simt`
filter keeps only SIMT kernels, forces SIMT params to static codegen,
and removes Volta/Turing/Ampere TensorOp, int8, and NVRTC paths from
the active kernel set.

Add `dtk_tensorop` as a separate non-default adaptation entry point for
future Ampere TensorOp work. This keeps static non-int8 Ampere TensorOp
params while still excluding Volta/Turing, int8, and NVRTC paths.

Allow fp16 workloads to use SIMT fallback when `SPCONV_DTK_KERNEL_FILTER`
is set to `dtk_simt`. This updates both the Python tuner and generated
C++ ConvTunerSimple logic so fp16 no longer depends on currently
unsupported TensorOp paths on DTK.

Add `SPCONV_FORCE_CUDA_ARCH` to keep runtime dispatch aligned with the
compiled arch list, and keep the BW100 path explicit with `9.3`.

Adjust DTK build/runtime compatibility:
- reuse the guarded editable-install state during constants setup
- skip the Linux Thrust `-fno-gnu-unique` flag under the DTK inline-PTX
  compatibility path
- add launch bounds to helper kernels that are launched with 1024
  threads/block

This leaves full TensorOp, int8, fp8, and NVRTC support out of the
recommended DTK path. Those remain future adaptation work.

a2dd956c

15 Dec, 2024 3 commits
- change all build back to windows-2019 · 263d6b47
  yan.yan authored Dec 15, 2024
  
  263d6b47
- fix windows · 76948563
  yan.yan authored Dec 15, 2024
  
  76948563
- fix windows prebuilt problem · 5153528c
  yan.yan authored Dec 15, 2024
  
  5153528c
10 Dec, 2024 1 commit
- fix compile problem · edf48feb
  yan.yan authored Dec 10, 2024
  
  edf48feb
09 Dec, 2024 1 commit
- add new prebuilts for 3.13 and cu126 · f0ffe617
  yan.yan authored Dec 09, 2024
  
  f0ffe617
19 Apr, 2023 1 commit
- fix #570 cpu package CI bug · 125a194d
  yan.yan authored Apr 19, 2023
  
  125a194d
23 Mar, 2023 3 commits
- fix #575 use a flag to enable large-kernel algo · cd99e7a6
  yan.yan authored Mar 24, 2023
  
  cd99e7a6
- new version number · f101f97e
  yan.yan authored Mar 24, 2023
  
  f101f97e
- v2.3.4: global pool, generative model support · f582ec34
  yan.yan authored Mar 23, 2023
  
  f582ec34
02 Feb, 2023 2 commits
- add some example · 004effbd
  yan.yan authored Feb 02, 2023
  
  004effbd
- v2.3.3: fix some problem in int8 · 2309ebe5
  yan.yan authored Feb 02, 2023
  
  2309ebe5
20 Jan, 2023 2 commits
- change version to use cumm 0.4.5 · b52636d1
  yan.yan authored Jan 20, 2023
  
  b52636d1
- change cumm version · 616fabee
  yan.yan authored Jan 20, 2023
  
  616fabee
19 Jan, 2023 2 commits
- fix windows cuda script · eb6fa7c2
  yan.yan authored Jan 19, 2023
  
  eb6fa7c2
- v2.3.0: int8 quantization · 42c7cdad
  yan.yan authored Jan 19, 2023
  
  42c7cdad
17 Jan, 2023 1 commit
- prepare int8 release · 1f6deed6
  yan.yan authored Jan 17, 2023
  
  1f6deed6
10 Jan, 2023 1 commit
- sync quantization code · 5b3fe9e7
  yan.yan authored Jan 10, 2023
  
  5b3fe9e7
04 Jan, 2023 1 commit
- sync quantization code · e387ee74
  yan.yan authored Jan 04, 2023
  
  e387ee74
03 Jan, 2023 1 commit
- still working on int8 · b1c57a31
  yan.yan authored Jan 03, 2023
  
  b1c57a31
29 Dec, 2022 1 commit
- working on quantization · aa26c99e
  yan.yan authored Dec 29, 2022
  
  aa26c99e
27 Dec, 2022 1 commit

Large kernel for implicit gemm (#547) · ee8c9465

FindDefinition authored Dec 27, 2022



* large kernel bwd&bwdI, not test increment RS

* large kernel fix, no split_mask and increment rs

* large kernel fix2, no split_mask and increment rs

* reset benchmark.py

* fix merge
Co-authored-by: EvernightAurora <2465542858@qq.com>

ee8c9465

05 Nov, 2022 2 commits
- pypi temporary shutdown, use a new version · bdfbf4a2
  yan.yan authored Nov 06, 2022
  
  bdfbf4a2
- fix #532 overflow in huge dim · e2df774f
  yan.yan authored Nov 06, 2022
  
  e2df774f
26 Oct, 2022 1 commit
- v2.2.4: add prebuilts for cuda 11.6 and 11.8 · 1f5ce924
  yan.yan authored Oct 26, 2022
  
  1f5ce924
18 Oct, 2022 2 commits
- update RTX 4090 benchmark · bd5bc8db
  yan.yan authored Oct 18, 2022
  
  bd5bc8db
- fix #524 and small bugs · 24df06fe
  yan.yan authored Oct 18, 2022
  
  24df06fe
28 Sep, 2022 1 commit
- v2.2.3: fix contiguous, add msg if point vanish · 8b52b3a9
  yan.yan authored Sep 28, 2022
  
  8b52b3a9
26 Sep, 2022 3 commits
- fix small bug · 1661828b
  yan.yan authored Sep 27, 2022
  
  1661828b
- detailed libspconv example · 48c8434d
  yan.yan authored Sep 27, 2022
  
  48c8434d
- update doc · 16a8cb24
  yan.yan authored Sep 26, 2022
  
  16a8cb24
25 Sep, 2022 5 commits
- update docs · 596a3cc0
  yan.yan authored Sep 25, 2022
  
  596a3cc0
- fix cuda version bug · b63c08aa
  yan.yan authored Sep 25, 2022
  
  b63c08aa
- fix CI problem · 77a7981a
  yan.yan authored Sep 25, 2022
  
  77a7981a
- change cumm version · d4de767e
  yan.yan authored Sep 25, 2022
  
  d4de767e
- fix build and nvrtc problem · bf34f040
  yan.yan authored Sep 25, 2022
  
  bf34f040
24 Sep, 2022 2 commits
- fix python 3.11 build · 8c25ed52
  yan.yan authored Sep 25, 2022
  
  8c25ed52
- fix cpu only build problem · 2f66dd23
  yan.yan authored Sep 25, 2022
  
  2f66dd23