[Fix] Wrap all CUDA runtime API/CUB calls with macro (#4083)
* Wrap all CUDA runtime API/CUB calls with macro
* remove the usage of explicit cudaMalloc in favor of AllocWorkspace
* fix typo
Co-authored-by:
Israt Nisa <neesha295@gmail.com>
Showing
Please register or sign in to comment