- 20 May, 2021 1 commit
-
-
nv-dlasalle authored
[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825) * Split NCCL wrapper from sparse optimizer and sparse embedding * Add more unit tests for single node nccl * Fix unit test for tf * Switch to device histogram * Fix histgram issues * Finish migration to histogram * Handle cases with zero send/recieve data * Start on partition object * Get compiling * Updates * Add unit tests * Switch to partition object * Fix linting issues * Rename partition file * Add python doc * Fix python assert and finish doxygen comments * Remove stubs for range based partition to satisfy pylint * Wrap unit test in GPU only * Wrap explicit cuda call in ifdef * Merge with partition.py * update docstrings * Cleanup partition_op * Add Workspace object * Switch to using workspace object * Move last remainder based function out of nccl_api * Add error messages * Update docs with examples * Fix linting erros Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 28 Jun, 2020 1 commit
-
-
Minjie Wang authored
* add cub; array cumsum * CSRSliceRows * fix warning * operator << for ndarray; CSRSliceRows * add CSRIsSorted * add csr_sort * inplace coosort and outplace csrsort * WIP: coo is sorted * mv cuda_utils * add AllTrue utility * csr sort * coo sort * coo2csr for sorted coo arrays * CSRToCOO from sorted * pass tests for the new kernel changes * cannot use inplace sort * lint * try fix msvc error * Fix g.copy_to and g.asnumbits; ToBlock no longer uses CSC * stash * revert some hack * revert some changes * address comments * fix * fix to_block unittest * add todo note
-
- 31 Jan, 2020 1 commit
-
-
Quan (Andy) Gan authored
* trying to refactor IndexSelect * partial implementation * add index select and assign for floats as well * move to random choice source * more updates * fixes * fixes * more fixes * adding python impl * fixes * unit test * lint * lint x2 * lint x3 * update metapath2vec * debugging performance * still debugging for performance * tuning * switching to succvec * redo * revert non-uniform sampler to use vector * still not fast * why does this crash with OpenMP??? * because there was a data race!!! * add documentations and remove assign op * lint * lint x2 * lol what have i done * lint x3 * fix and disable gpu testing * bugfix * generic random walk * reorg the random walk source code * Update randomwalks.h * Update randomwalks_cpu.cc * rename file * move internal function to anonymous ns * reorg & docstrings * constant restart probability * docstring fix * more commit * random walk with restart, tested * some fixes * switch to using NDArray for choice * massive fix & docstring * lint x? * lint x?? * fix * export symbols * skip gpu test * addresses comments * replaces another VecToIdArray * add randomwalks.h to include * replace void * with template
-