fix #575 use a flag to enable large-kernel algo

cd99e7a6 · yan.yan · f101f97e · cd99e7a6 · cd99e7a6 · cd99e7a6
Commit cd99e7a6 authored Mar 24, 2023 by yan.yan
Expand all Show whitespace changes
Inline Side-by-side

Showing with 260 additions and 182 deletions

CHANGELOG.md CHANGELOG.md +1 -1

docs/PERFORMANCE_GUIDE.md docs/PERFORMANCE_GUIDE.md +3 -1

spconv/pytorch/conv.py spconv/pytorch/conv.py +256 -180

No files found.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,7 +2,7 @@
 ## [2.3.5] - 2023-03-24
 ### Fixed 
- pypi project reach size limit, so we need to assign a new version number.
+- use a flag to enable large kernel algo (need time to compile at runtime)
 ## [2.3.4] - 2023-03-23
 ### Added 

--- a/docs/PERFORMANCE_GUIDE.md
+++ b/docs/PERFORMANCE_GUIDE.md
@@ -27,3 +27,5 @@
 * If you train with float32 and ampere or later GPUs, you can set ```spconv.constants.SPCONV_ALLOW_TF32``` to enable faster fp32 training.
 See [benchmark](BENCHMARK.md) for more performance details of different algorithms.
 * Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
+* if your kernel size volume larger than 32, spconv will use a slower (and more inaccurate in fp16) algorithm. to use a faster algo for large kernel size (need time to compile at runtime), use ```large_kernel_fast_algo=True```
+* use ```SparseGlobalMaxPool``` instead of use large kernel size when you need global pool.
\ No newline at end of file
--- a/spconv/pytorch/conv.py
+++ b/spconv/pytorch/conv.py