This library brings [Spatially-sparse convolutional networks](https://github.com/btgraham/SparseConvNet) to PyTorch. Moreover, it introduces **Submanifold Sparse Convolutions**, that can be used to build computationally efficient sparse VGG/ResNet/DenseNet-style networks.
| 1 | 2.0.1+das.dtk2504 | v2.4.1 | dtk2504|
| 1 | 2.1.0+das.dtk2504 | v2.5.1 | dtk2504|
With regular 3x3 convolutions, the set of active (non-zero) sites grows rapidly:<br/>
| 1 | 2.0.1+das.dtk25041 | v2.4.1 | dtk25041|
<br/>
| 1 | 2.1.0+das.dtk25041 | v2.5.1 | dtk25041|
With **Submanifold Sparse Convolutions**, the set of active sites is unchanged. Active sites look at their active neighbors (green); non-active sites (red) have no computational overhead: <br/>
## 编译流程
<br/>
```
Stacking Submanifold Sparse Convolutions to build VGG and ResNet type ConvNets, information can flow along lines or surfaces of active points.<br/>
Disconnected components don't communicate at first, although they will merge due to the effect of strided operations, either pooling or convolutions. Additionally, adding ConvolutionWithStride2-SubmanifoldConvolution-DeconvolutionWithStride2 paths to the network allows disjoint active sites to communicate; see the 'VGG+' networks in the paper.<br/>
From left: **(i)** an active point is highlighted; a convolution with stride 2 sees the green active sites **(ii)** and produces output **(iii)**, 'children' of hightlighted active point from (i) are highlighted; a submanifold sparse convolution sees the green active sites **(iv)** and produces output **(v)**; a deconvolution operation sees the green active sites **(vi)** and produces output **(vii)**.
source /usr/local/bin/fastpt -c
python3 setup.py bdist_wheel
## Dimensionality and 'submanifolds'
```
## 验证安装
SparseConvNet supports input with different numbers of spatial/temporal dimensions.
```
Higher dimensional input is more likely to be sparse because of the 'curse of dimensionality'. <br/>
source /usr/local/bin/fastpt -e
pip3 list | grep sparseconvnet
Dimension|Name in 'torch.nn'|Use cases
```
:--:|:--:|:--:
## 测试
1|Conv1d| Text, audio
```
2|Conv2d|Lines in 2D space, e.g. handwriting
source /usr/local/bin/fastpt -e
3|Conv3d|Lines and surfaces in 3D space or (2+1)D space-time
cd examples
4| - |Lines, etc, in (3+1)D space-time
python3 hello-world.py
We use the term 'submanifold' to refer to input data that is sparse because it has a lower effective dimension than the space in which it lives, for example a one-dimensional curve in 2+ dimensional space, or a two-dimensional surface in 3+ dimensional space.
In theory, the library supports up to 10 dimensions. In practice, ConvNets with size-3 SVC convolutions in dimension 5+ may be impractical as the number of parameters per convolution is growing exponentially. Possible solutions include factorizing the convolutions (e.g. 3x1x1x..., 1x3x1x..., etc), or switching to a hyper-tetrahedral lattice (see [Sparse 3D convolutional neural networks](http://arxiv.org/abs/1505.02890)).
## Hello World
SparseConvNets can be built either by [defining a function that inherits from torch.nn.Module](examples/Assamese_handwriting/VGGplus.py) or by stacking modules in a [sparseconvnet.Sequential](PyTorch/sparseconvnet/sequential.py):
To run the examples you may also need to install unrar:
```
apt-get install unrar
```
## License
SparseConvNet is BSD licensed, as found in the LICENSE file. [Terms of use](https://opensource.facebook.com/legal/terms). [Privacy](https://opensource.facebook.com/legal/privacy)
1.[ICDAR 2013 Chinese Handwriting Recognition Competition 2013](https://web.archive.org/web/20160418143451/http://www.nlpr.ia.ac.cn/events/CHRcompetition2013/competition/Home.html) First place in task 3, with test error of 2.61%. Human performance on the test set was 4.81%. [Report](https://web.archive.org/web/20160910012723/http://www.nlpr.ia.ac.cn/events/CHRcompetition2013/competition/ICDAR%202013%20CHR%20competition.pdf)
2.[Spatially-sparse convolutional neural networks, 2014](http://arxiv.org/abs/1409.6070) SparseConvNets for Chinese handwriting recognition
3.[Fractional max-pooling, 2014](http://arxiv.org/abs/1412.6071) A SparseConvNet with fractional max-pooling achieves an error rate of 3.47% for CIFAR-10.
4.[Sparse 3D convolutional neural networks, BMVC 2015](http://arxiv.org/abs/1505.02890) SparseConvNets for 3D object recognition and (2+1)D video action recognition.
5.[Kaggle plankton recognition competition, 2015](https://www.kaggle.com/c/datasciencebowl) Third place. The competition solution is being adapted for research purposes in [EcoTaxa](http://ecotaxa.obs-vlfr.fr/).
6.[Kaggle Diabetic Retinopathy Detection, 2015](https://www.kaggle.com/c/diabetic-retinopathy-detection/) First place in the Kaggle Diabetic Retinopathy Detection competition.
7.[SparseConvNet 'classic'](https://github.com/btgraham/SparseConvNet-archived) version
8.[Submanifold Sparse Convolutional Networks, 2017](https://arxiv.org/abs/1706.01307) Introduces deep 'submanifold' SparseConvNets.
9.[Workshop on Learning to See from 3D Data, 2017](https://shapenet.cs.stanford.edu/iccv17workshop/) First place in the [semantic segmentation](https://shapenet.cs.stanford.edu/iccv17/) competition. [Report](https://arxiv.org/pdf/1710.06104)
10.[3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, 2017](https://arxiv.org/abs/1711.10275) Semantic segmentation for the ShapeNet Core55 and NYU-DepthV2 datasets, CVPR 2018
11.[Unsupervised learning with sparse space-and-time autoencoders](https://arxiv.org/abs/1811.10355)(3+1)D space-time autoencoders
12.[ScanNet 3D semantic label benchmark 2018](http://kaldir.vc.in.tum.de/scannet_benchmark/semantic_label_3d) 0.726 average IOU for 3D semantic segmentation.
13.[MinkowskiEngine](https://github.com/StanfordVL/MinkowskiEngine) is an alternative implementation of SparseConvNet; [0.736 average IOU for ScanNet](https://github.com/chrischoy/SpatioTemporalSegmentation).
14.[SpConv: PyTorch Spatially Sparse Convolution Library](https://github.com/traveller59/spconv) is an alternative implementation of SparseConvNet.
15.[Live Semantic 3D Perception for Immersive Augmented Reality](https://ieeexplore.ieee.org/document/8998140) describes a way to optimize memory access for SparseConvNet.
16.[OccuSeg](https://arxiv.org/abs/2003.06537) real-time object detection using SparseConvNets.
17.[TorchSparse](https://github.com/mit-han-lab/torchsparse) implements 3D submanifold convolutions.
19.[VoTr](https://github.com/PointsCoder/VOTR) implements submanifold [voxel transformers](https://openaccess.thecvf.com/content/ICCV2021/papers/Mao_Voxel_Transformer_for_3D_Object_Detection_ICCV_2021_paper.pdf) using [SpConv](https://github.com/traveller59/spconv).
20.[Mix3D](https://github.com/kumuji/mix3d) brings [MixUp](https://openreview.net/forum?id=r1Ddp1-Rb) to the sparse setting— 0.781 average IOU for ScanNet 3D semantic segmentation.
21.[Point Transformer V3](https://arxiv.org/abs/2312.10035) uses sparse convolutions as an enhanced conditional positional encoding (xCPE); 0.794 average IOU for ScanNet 3D semantic segmentation.
## Citations
If you find this code useful in your research then please cite:
**[3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, CVPR 2018](https://arxiv.org/abs/1711.10275)**<br/>