- 13 Sep, 2024 1 commit
-
-
sangwzh authored
-
- 29 Feb, 2024 1 commit
-
-
Muhammed Fatih BALIN authored
-
- 07 Nov, 2022 2 commits
-
-
Hongzhi (Steve), Chen authored
* [Misc] clang-format auto fix. * blabla * nolint * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Hongzhi (Steve), Chen authored
* replace * blabla * balbla * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 06 Nov, 2022 1 commit
-
-
Hongzhi (Steve), Chen authored
* param * brief * note * return * tparam * brief2 * file * return2 * return * blabla * all Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 04 Nov, 2022 1 commit
-
-
Hongzhi (Steve), Chen authored
* [Misc] clang-format auto fix. * fix * manual Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 06 Sep, 2022 1 commit
-
-
Chang Liu authored
* Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by:xiny <xiny@nvidia.com>
-
- 06 Jun, 2022 1 commit
-
-
Xin Yao authored
Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Israt Nisa <neesha295@gmail.com>
-
- 06 Sep, 2021 1 commit
-
-
Jinjing Zhou authored
* remove * remove * fix * remove * remove
-
- 22 Mar, 2021 1 commit
-
-
nv-dlasalle authored
[Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 (#2758) * Wrap cub with prefixes and remove thrust * Using counting iterator Co-authored-by:
Zihao Ye <expye@outlook.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 08 Feb, 2021 1 commit
-
-
nv-dlasalle authored
* Add start of to_block gpu implementation * Pull in more changes from 0.4.2 cuda_to_block * Move more code to IdArray * Refactor DeviceNodeMapMaker * Updates * get compiling * Integrate to_block * Fix ID allocation * Minor fixes * Cleanup cuda calls to use cuda_common * Reduce kernel calls * Lint cleanup * Expand documentation * Remove unused function * Rename variables for consistency * Add doxygen comments * Fix file extension * Remove raw asynccopy for deviceapi * Remove unused function * Fix block/tile configuration * Add cuda_device_common.cuh * Add basic hashtable * Migrate part of hashtable * Refactor to use external hashtable * Make functions members * Format hash table functions * Migrate duplicate filling * Move last function over * Refactor with cu file * lint c++ code * Move context check to C++ code * Use macro switch * Add missing files * Update docstring * update docs * Move atomic functions * Refactor hashtable * Fix linting * Expand docs * Fix mismatched argument names * Switch doxygen comments from using @param to \param Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-