A faster & more memory-efficient implementation of DynamicScatter (#318)
* a faster & more memory-efficient implementation of DynamicScatter
* fix format issues and add pytest skip code for tests on machines without cuda support
* some trivial changes:
decrease the number of kernel threads per block to 512, to enable inference on GPUs with computing capability lower than 2.0
change the backpropagation behavior of max-reduction. when there are multiple points shares the same maximum feature value, only the first point (with lowest row index) among them is chosen to propagate the output gradient back. before this change, all points with the same maximum feature value can propagate the output gradient back. this change makes the max-reduction behaves in consistence with torch.max. this change may cause gradcheck failure in test_dynamic_scatter.py. please do not worry about it because torch.max fails the gradcheck too.
* fix typo
Co-authored-by:
zhanggefan <1152009@tongji.edu.cn>
Showing
This diff is collapsed.
Please register or sign in to comment