- 01 Mar, 2022 1 commit
-
-
Quan (Andy) Gan authored
* fix * explain * oops
-
- 21 Oct, 2021 1 commit
-
-
Xin Yao authored
* gpu compact graph template * cuda compact graph draft * fix typo * compact graphs * pass unit test but fail in training * example using EdgeDataLoader on the GPU * refactor cuda_compact_graph and cuda_to_block * update training scripts * fix linting * fix linting * fix exclude_edges for the GPU * add --data-cpu & fix copyright
-
- 16 Sep, 2021 1 commit
-
-
nv-dlasalle authored
[Performance][Feature] Add `src_nodes` paramter to `to_block()` to avoid cost running unique() when available. (#2973) * Add lhs_nodes are paremeter to to_block * Update unit test * Switch to simplified node conversion * Switch lhs_nodes to be in/out parameter * Update docs Co-authored-by:
Da Zheng <zhengda1936@gmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 15 Jun, 2021 1 commit
-
-
Tianqi Zhang (张天启) authored
* add bruteforce impl * add nn descent implementation * change doc-string * remove redundant func * use local rng for cuda * fix lint * fix lint * fix bug * fix bug * wrap nndescent_knn_graph into knn * fix lint * change function names * add comment for dist funcs * let the compiler do the unrolling * use better blocksize setting * remove redundant line * check the return of the cub calls Co-authored-by:Tong He <hetong007@gmail.com>
-
- 13 Jun, 2021 1 commit
-
-
nv-dlasalle authored
[Performance] Perform to_block on the GPU when the dataloader is created with a GPU `device`. (#3016) * add output device for dataloading * Update dataloader * Get sampler device from dataloader * Fix line length * Update examples * Fix to_block GPU for empty relation types * Handle the case where the DistGraph has None for the underlying graph Co-authored-by:Da Zheng <zhengda1936@gmail.com>
-
- 19 May, 2021 1 commit
-
-
Tianqi Zhang (张天启) authored
* add bruteforce impl * add support for bruteforce-sharemem * modify python API * add tests * change file path * change python API * fix lint * fix test * also check worst_dist in the last few dim * use heap and early-stop on CPU * fix lint * fix lint * add device check * use cuda function to determine max shared mem * use cuda to determine block info * add memory free for tmp var * update doc-string and add dist option * fix lint * add more tests Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 08 Feb, 2021 1 commit
-
-
nv-dlasalle authored
* Add start of to_block gpu implementation * Pull in more changes from 0.4.2 cuda_to_block * Move more code to IdArray * Refactor DeviceNodeMapMaker * Updates * get compiling * Integrate to_block * Fix ID allocation * Minor fixes * Cleanup cuda calls to use cuda_common * Reduce kernel calls * Lint cleanup * Expand documentation * Remove unused function * Rename variables for consistency * Add doxygen comments * Fix file extension * Remove raw asynccopy for deviceapi * Remove unused function * Fix block/tile configuration * Add cuda_device_common.cuh * Add basic hashtable * Migrate part of hashtable * Refactor to use external hashtable * Make functions members * Format hash table functions * Migrate duplicate filling * Move last function over * Refactor with cu file * lint c++ code * Move context check to C++ code * Use macro switch * Add missing files * Update docstring * update docs * Move atomic functions * Refactor hashtable * Fix linting * Expand docs * Fix mismatched argument names * Switch doxygen comments from using @param to \param Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-