CHANGELOG.md 5.89 KB
Newer Older
yanyan's avatar
yanyan committed
1
# Changelog
2

yan.yan's avatar
yan.yan committed
3
4
5
6
7
## [2.3.3] - 2023-02-02
### Fixed 
- Fix int8 nvrtc error when use prebuilt
- Fix int8 kernel when run on turing GPU

yan.yan's avatar
yan.yan committed
8
9
10
11
12
## [2.3.2] - 2023-01-20
### Changed 
- change version


yan.yan's avatar
yan.yan committed
13
14
15
16
## [2.3.1] - 2023-01-20
### Changed 
- change version

yan.yan's avatar
yan.yan committed
17
18
19
20
21
## [2.3.0] - 2023-01-19
### Added 
- Add int8 quantization support
- Add large kernel support for implicit gemm (kv <= 128)

22
23
24
25
## [2.2.6] - 2022-11-06
### Fixed 
- CI fail because of pypi temporary shutdown. assign a new version and run again.

yan.yan's avatar
yan.yan committed
26
27
28
29
## [2.2.5] - 2022-11-05
### Fixed 
- Fix overflow when shape is too large

30
31
32
33
34
35
## [2.2.4] - 2022-10-13
### Added 
- Add prebuilt for CUDA 11.8 (RTX 4090 and H100) and CUDA 11.6.
### Fixed 
- Fix small bugs

36
37
38
39
40
## [2.2.3] - 2022-9-28
### Fixed 
- Fix missing .contiguous for input feature
- Add some debug msg if points vanished.

yan.yan's avatar
yan.yan committed
41
42
43
44
## [2.2.2] - 2022-9-25
### Fixed 
- Fix CI problem: main function too long and cause OOM in CI vm.

yan.yan's avatar
yan.yan committed
45
46
47
48
49
## [2.2.1] - 2022-9-25
### Fixed 
- Fix build problem
- Fix nvrtc problem

yan.yan's avatar
yan.yan committed
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
## [2.2.0] - 2022-9-24
### Added 
- Add Ampere support. faster fp16, faster tf32 and greatly faster int8 kernels in Ampere GPUs.
- Add pure c++ code generation (libspconv.so) for deploy (or train in another deeplearning framework)
- Add NVRTC support for all gemm kernels. if your GPU architecture isn't compiled in prebuilt, spconv will use slightly slower (10-20us overhead for every kernel launch) NVRTC kernels. 

### Fixed
- Fix launch fail in maxpool if too much voxels 

### Changed
- all weight layout will be KRSC, don't support old spconv 1.x weights anymore.
- previous gemm ops in ops.py now move to c++ by default (controlled by spconv.constants.SPCONV_CPP_GEMM)

### Removed
- drop python 3.6 support.
- pascal and kepler architecture is removed in CUDA 12 prebuilt.

yan.yan's avatar
yan.yan committed
67
68
69
## [2.1.22] - 2022-6-11
### Fixed
- Fix thrust problem by adding -fvisibility=hidden
yanyan's avatar
yanyan committed
70

yan.yan's avatar
sync  
yan.yan committed
71
72
73
74
75
## [2.1.22] - 2022-4-14
### Added
- add full nvrtc support
- add support for large spatial shape and batch size. if detect large shape, we use int64 instead of int32 when hashing.

yan.yan's avatar
yan.yan committed
76
77
78
79
## [2.1.21] - 2021-12-9
### Added
- add sm_37
- add fp16 kernels witl fp32 accumulator (run slower, but can avoid nan if channel size is too large)
yan.yan's avatar
yan.yan committed
80
- add SPCONV_BWD_SPLITK env to control splitk candidates.
yan.yan's avatar
yan.yan committed
81

82
83
84
85
## [2.1.20] - 2021-12-6
### Added
- Add fp16 conv simt kernels for mixed-training in pascal or older GPUS. WARNING: not optimized for TESLA P100 which has 2x throughput in half.

86
87
88
89
## [2.1.19] - 2021-12-3
### Fixed
- Fix wrong arch assert in all kernels for old GPUs to make spconv work in sm_50 GPUs

90
91
92
93
94
## [2.1.18] - 2021-11-29
### Fixed
- Fix a small bug of spatial_shape.
- Fix a bug in PointToVoxel, we must always return a clone instead of a view.

95
96
97
## [2.1.17] - 2021-11-29
### Fixed
- Fix a bug in sparse add.
yan.yan's avatar
yan.yan committed
98
- Fix a serious bug in conv weight init.
99
100
101
102
### Added
- Add more wrong usage check
- Add insert_exist_keys for hash table

yan.yan's avatar
yan.yan committed
103
104
105
106
## [2.1.16] - 2021-11-28
### Fixed
- Fix strange compile problem in windows

107
108
109
110
## [2.1.15] - 2021-11-28
### Fixed
- Fix missing pccm.Class in setup.py

111
112
113
114
115
116
117
118
## [2.1.14] - 2021-11-28
### Added 
- Add hash table
- update cumm version
- Add AddTableMisaligned for sptensors with same shape but different indices.
### Fixed
- Fix a bug already fixed in 2.1.10 but introduced in 2.1.12 again.

119
120
121
122
123
## [2.1.13] - 2021-?-?
### Added 
- Add some ops from spconv 1.x, see spconv.utils for more details.
- Add some debug tool for users to attach more info in issue.

124
125
126
127
128
129
## [2.1.12] - 2021-11-23
### Added 
- Add a method for voxel generator to get pc_voxel_id, which is usually used in semantic segmentation
### Fixed
- Fix a bug in cuda voxel generater when max_voxels is smaller than real number of voxels

130
131
132
133
134
135
## [2.1.11] - 2021-11-22
### Fixed
- Fixed a bug Volta kernels (TITAN V, Tesla V100), backward weight kernels use f16 as accumulator. we should use f32.
- Fixed a corner case when user use kernel size = 1x1 but stride != 1.
- Fixed a corner case when input feature is non-contiguous when maxpool.

yan.yan's avatar
yan.yan committed
136
137
138
139
140
141
142
143
## [2.1.10] - 2021-11-19
### Fixed
- Fixed a bug in utils.PointToVoxel, shouldn't get cuda stream in cpu code

## [2.1.9] - 2021-11-18
### Removed
- Remove a wrong assert

yan.yan's avatar
yan.yan committed
144
145
146
147
## [2.1.8] - 2021-11-15
### Added
- Add support for pytorch 1.5

148
149
150
151
## [2.1.7] - 2021-11-11
### Fixed
- Fix a bug when net have inverse and run inference in eval mode.

152
153
154
155
156
157
## [2.1.6] - 2021-11-10
### Fixed
- Fix missing -fopenmp in linker for CPU only
### Removed
- remove stale comment sending in CI

158
159
160
161
162
163
164
165
166
## [2.1.5] - 2021-11-10
### Added
- Add cuda profile tool
- Add python 36 support
### Changed
- Format all code
### Removed
- remove a unnecessary device sync and slightly improve performance.

167
168
169
170
171
172
173
174
175
176
177
178
## [2.1.4] - 2021-11-10
### Fixed
- Fix a bug of SparseInverseConv3d

## [2.1.3] - 2021-11-08
### Fixed
- Fix a bug of CPU only package

## [2.1.2] - 2021-11-06
### Fixed
- Fix a bug of python 3.7

yan.yan's avatar
v2.1  
yan.yan committed
179
## [2.1.0] - 2021-10-31
180
181
182
183
### Added
- add implicit gemm algorithm for all kind of convolution with kernel volume <= 32. this algorithm is very fast with float16.
- add pytorch wrapper for voxel generator
- add CPU support and CPU-only build.
yan.yan's avatar
v2.1  
yan.yan committed
184
185
186
187
188
189

## [2.0.2] - 2021-10-26
### Fixed
- Fix a serious bug that do nothing with non-spconv layers in SparseSequential
- Fix a bug of ProxyableClassMeta

yan.yan's avatar
yan.yan committed
190
191
192
193
194
195
## [2.0.0] - 2021-10-16
### Changed
- Change build system from cmake to pccm.
- Change pytorch python code to spconv.pytorch
- Rewrite All c++ code.

196
197
198
199
200
## [1.2.1] - 2020-06-04
### Changed
- The subm indice pair generation speed is greatly increased by two tricks: 1. most subm conv use only kernelsize=3, so we can unroll loops to get 100% performance increase. 2. subm indice pairs have a property: indicePairs[0, i] = indicePairs[1, kernelVolume - i - 1], so we can get another 100% performance increase. 


yanyan's avatar
yanyan committed
201
202
203
204
205
206
207
208
209
## [1.2.0] - 2020-05-28
### Added
- add batch gemm support. small performance increasement but more gpu memory usage. you can use algo=spconv.ConvAlgo.Batch to use it.

### Changed
- replace most of 'functor' with c++14 dispatch in c++ code.

### Fixed
- change gather/scatterAdd kernel parameter to support large points.