-
Ping Gong authored
* Leverage hashmap to accelerate CSRSliceMatrix * fix lint check * use `min` in cuda_runtime.ch * fix hash func * add some comments and adjust the <grid,block> of the _SegmentMaskColKernel kernel * set device and stream for thrust::for_each * use thrust::cuda::par_nosync Co-authored-by:Xin Yao <xiny@nvidia.com>
aa419895