Commit cd99e7a6 authored by yan.yan's avatar yan.yan
Browse files

fix #575 use a flag to enable large-kernel algo

parent f101f97e
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
## [2.3.5] - 2023-03-24 ## [2.3.5] - 2023-03-24
### Fixed ### Fixed
- pypi project reach size limit, so we need to assign a new version number. - use a flag to enable large kernel algo (need time to compile at runtime)
## [2.3.4] - 2023-03-23 ## [2.3.4] - 2023-03-23
### Added ### Added
......
...@@ -27,3 +27,5 @@ ...@@ -27,3 +27,5 @@
* If you train with float32 and ampere or later GPUs, you can set ```spconv.constants.SPCONV_ALLOW_TF32``` to enable faster fp32 training. * If you train with float32 and ampere or later GPUs, you can set ```spconv.constants.SPCONV_ALLOW_TF32``` to enable faster fp32 training.
See [benchmark](BENCHMARK.md) for more performance details of different algorithms. See [benchmark](BENCHMARK.md) for more performance details of different algorithms.
* Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111. * Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
* if your kernel size volume larger than 32, spconv will use a slower (and more inaccurate in fp16) algorithm. to use a faster algo for large kernel size (need time to compile at runtime), use ```large_kernel_fast_algo=True```
* use ```SparseGlobalMaxPool``` instead of use large kernel size when you need global pool.
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment