[Spconv 1.x code](https://github.com/traveller59/spconv/tree/v1.2.1). We won't provide any support for spconv 1.x since it's deprecated. use spconv 2.x if possible.
```spconv``` is a project that provide heavily-optimized sparse convolution implementation with tensor core support.
## Breaking changes in Spconv 2.x
*```spconv.xxx``` move to ```spconv.pytorch.xxx```, change all ```import spconv``` to ```import spconv.pytorch as spconv``` and ```from spconv.xxx import``` to ```from spconv.pytorch.xxx import```.
*```use_hash``` in Sparse Convolution is removed, we only use hash table in 2.x.
*```x.features = F.relu(x)``` now raise error. use ```x = x.replace_feature(F.relu(x.features))``` instead.
* weight layout has been changed to RSKC (native algorithm) or KRSC (implicit gemm), no longer RSCK (spconv 1.x). RS is kernel size, C is input channel, K is output channel.
* all util ops are removed (pillar scatter/nms/...)
* VoxelGenerator has been replaced by Point2VoxelGPU[1-4]d/Point2VoxelCPU[1-4]d.
* spconv 2.x don't support CPU for now
* test spconv 1.x model in spconv 2.x: set environment variable before run program. Linux: ```export SPCONV_FILTER_HWIO="1"```, Windows powershell: ```$Env:SPCONV_FILTER_HWIO = "1"```
## Upcoming release Spconv 2.1.0 (Delay to 11.7.2021, sorry):
[Spconv 1.x code](https://github.com/traveller59/spconv/tree/v1.2.1). We won't provide any support for spconv 1.x since it's deprecated. use spconv 2.x if possible. <!--remove this message in spconv 2.2-->
**Status**: CPU build is ready. implicit gemm is ready. working on implicit-gemm-style indice generation for standard conv/pool, and implicit-gemm-style maxpool op.
## Breaking changes in Spconv 2.x
* implicit gemm algorithm, greatly faster than native algorithm when using float16 (tested in RTX 3080 Laptop).
* simple CPU support and CPU-only build
* add pytorch cpu/cuda voxel generator
* fix a bug of mixed precision training.
Spconv 1.x users **NEED READ [THIS](docs/SPCONV_2_BREAKING_CHANGEs.md)** before using spconv 2.x.
## News in Spconv 2.0.0
## Spconv 2.1 vs Spconv 1.x
* training/inference speed is increased (+50~80% for float32)
* support int8/tensor core
* spconv now can be installed by **pip**. see install section in readme for more details.
* Microsoft Windows support (only windows 10 has been tested).
* fp32 (not tf32) training/inference speed is increased (+50~80%)
* fp16 training/inference speed is greatly increased when your layer support tensor core (channel size must be multiple of 8).
* int8 op is ready, but we still need some time to figure out how to run int8 in pytorch.
* doesn't depend on pytorch binary.
* since spconv 2.x doesn't depend on pytorch binary (never in future), it's impossible to support torch.jit/libtorch inference.
TODO Spconv vs [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) vs [torchsparse](https://github.com/mit-han-lab/torchsparse)
-->
## Roadmap for Spconv 2.2-2.3:
* TensorFormat32 support for faster fp32 training when you use NVIDIA Geforce RTX 30x0/Tesla A100/Quadro RTX Ax000 (2.2)
* change implicit gemm weight layout from KRSC to RSKC to make sure we can use native algorithm with implicit gemm weight. (2.2)
* documents (2.2)
* Ampere feature support (2.3)
* pytorch int8 inference, and QAT support (2.3)
## Usage
Firstly you need to use ```import spconv.pytorch as spconv``` in spconv 2.x.
Then see docs/USAGE.md.
Then see [this](docs/USAGE.md).
Don't forget to check [performance guide](docs/PERFORMANCE_GUIDE.md).
## Install
...
...
@@ -73,21 +75,50 @@ You need at least CUDA 10.2 to build and run spconv 2.x. We won't offer any supp
### Prebuilt
We offer python 3.7-3.10 and 11.1/11.4 prebuilt binaries for linux (manylinux) and windows 10/11.
CUDA 10.2 support will be added in version 2.0.2.
We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux) and windows 10/11.
We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.
We will provide prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.10 provide cuda 10.2 and 11.3 prebuilts, so we provide them too.
For Linux users, you need to install pip >= 20.3 first to install prebuilt.
CUDA 11.1 will be removed in spconv 2.2 because pytorch 1.10 don't provide prebuilts for it.
```pip install spconv``` for CPU only (**Linux Only**). you should only use this for debug usage, the performance isn't optimized due to manylinux limit (no omp support).
```pip install spconv-cu102``` for CUDA 10.2
```pip install spconv-cu111``` for CUDA 11.1
```pip install spconv-cu113``` for CUDA 11.3
```pip install spconv-cu114``` for CUDA 11.4
**NOTE** It's safe to have different minor cuda version between system and conda (pytorch). for example, you can use spconv-cu114 with anaconda version of pytorch cuda 11.1 in a OS with CUDA 11.2 installed.
**NOTE** It's safe to have different **minor** cuda version between system and conda (pytorch) **in Linux**. for example, you can use spconv-cu114 with anaconda version of pytorch cuda 11.1 in a OS with CUDA 11.2 installed.
### Build from source for development (JIT, recommend)
The c++ code will be built automatically when you change c++ code in project.
For NVIDIA Embedded Platforms, you need to specify cuda arch before build: ```export CUMM_CUDA_ARCH_LIST="7.2"``` for xavier.
### Native Explicit (deprecated and removed in spconv 2.x)
native algorithm (explicit, no fused) is standard gather-gemm-scatter algorithm. Assume we compute 3x3 conv, We can split it to 9 of 1x1 conv which can be computed by matmul, then sum them to get final result.
For sparse convolution, we also do split-gemm-sum to calculate conv, but we need to collect data first because it's sparse.
### Native
Fused version of above algorithm. 1.5x-2x faster than non-fused version.
### Implicit Gemm
```Native``` algorithm do minimal mma (matrix multiply add), but it need to serialize IO. The pipeline of ```Native``` is gather-gemm-scatter-gather-gemm-scatter-...
```Implicit Gemm``` fuse all calculation to one kernel and perform overlapped gather-mma-scatter to save a lot of time.
In my test, ```Implicit Gemm``` is almost 2x faster than ```Native```.
### Implicit Gemm Split Mask
TODO
In my test, ```Implicit Gemm Split Mask``` is slightly faster than ```Implicit Gemm```, but the indice generation is greatly slower, so currently we use ```Implicit Gemm``` by default.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Breaking changes in Spconv 2.x for spconv 1.x users
*```spconv.xxx``` move to ```spconv.pytorch.xxx```, change all ```import spconv``` to ```import spconv.pytorch as spconv``` and ```from spconv.xxx import``` to ```from spconv.pytorch.xxx import```.
*```use_hash``` and ```fused_bn``` in Sparse Convolution is removed, we only use hash table in 2.x.
*```x.features = F.relu(x.features)``` now raise error. use ```x = x.replace_feature(F.relu(x.features))``` instead.
* weight layout has been changed to RSKC (native algorithm) or KRSC (implicit gemm), no longer RSCK (spconv 1.x). RS is kernel size, C is input channel, K is output channel.
* all util ops are removed (pillar scatter/nms/rbbox_iou...)
* VoxelGenerator has been replaced by Point2VoxelGPU[1-4]d/Point2VoxelCPU[1-4]d.
* spconv < 2.1 don't support CPU. spconv 2.1+ support cpu for debug usage.
* test spconv 1.x model in spconv 2.x: set environment variable before run program. Linux: ```export SPCONV_FILTER_HWIO="1"```, Windows powershell: ```$Env:SPCONV_FILTER_HWIO = "1"```. **WARNING** test spconv 1.x model don't support implicit gemm algorithm, you need to train from scratch with spconv 2.x and select ConvAlgo.MaskSplitImplicitGemm.