SPCONV_DEVELOP_PLAN.md

<!--
 Copyright 2021 Yan Yan
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at
 
     http://www.apache.org/licenses/LICENSE-2.0
 
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

## Spconv 2.x Develop Plan

If someone want to contribute to spconv 2.x, feel free to start new discussion in github, or just email to me.


### v2.2 Core Features

- [ ] TF32 support 
- [ ] Make ```ConvAlgo.Native``` runable in KRSC layout and only use this layout in future
- [ ] PyTorch Int8 Support 

### v2.3 Core Features

- [ ] Move most of function in spconv.pytorch.ops to C++
- [ ] Ampere multi-stage gemm support
- [ ] Optimize CUDA Kernels for small-channel-size layers.

### v2.4 Core Features

- [ ] nvrtc support for gemm/conv kernels
- [ ] C++ only spconv
- [ ] TensorRT support

### Misc Features need contribution

- [ ] Test spconv 2.x in [torch-points3d](https://github.com/nicolas-chaulet/torch-points3d) and other frameworks
- [ ] Documents in github Page
- [ ] Better tests


### Details

1. TF32 support 

we only need to add tf32 tensor cores to cumm. not hard.

2. Make ```ConvAlgo.Native``` runable in KRSC layout

Add stride arg to gemm kernels, use offset + stride to force gemm kernel use KRSC layout as a "KC" matrix.

3. PyTorch Int8 Support

...

4. Move most of function in spconv.pytorch.ops to C++

Pure engieering work.

5. Ampere multi-stage gemm support

Not easy, we need to use new pattern to write gemm kernels.

6. Optimize CUDA Kernels for small-channel-size layers

modify cumm and make it support small kernels. not hard, but need time.

7. nvrtc support for gemm/conv kernels

need to rewrite kernel params in cumm. not easy.

8. C++ only spconv

actually code generation is easy, we can finish this easily after move ops to c++.

9. TensorRT support

The TensorRT support is the last feature in this plan. it needs lots of engieering work and prerequisites, may cost much time.