[Performance] Leverage hashmap to accelerate CSRSliceMatrix<kDGLCUDA, IdType> (#4924)
* Leverage hashmap to accelerate CSRSliceMatrix
* fix lint check
* use `min` in cuda_runtime.ch
* fix hash func
* add some comments and adjust the <grid,block> of the _SegmentMaskColKernel kernel
* set device and stream for thrust::for_each
* use thrust::cuda::par_nosync
Co-authored-by:
Xin Yao <xiny@nvidia.com>
Showing
Please register or sign in to comment