- 11 Jun, 2022 1 commit
-
-
Xin Yao authored
* Wrap all CUDA runtime API/CUB calls with macro * remove the usage of explicit cudaMalloc in favor of AllocWorkspace * fix typo Co-authored-by:Israt Nisa <neesha295@gmail.com>
-
- 08 Jun, 2022 1 commit
-
-
Rhett Ying authored
* [ist] enable time out when fetching msg * fix lint error * minor refinements * improve minor log * fix dist test * fix timeout issue in tensorpipe
-
- 06 Jun, 2022 1 commit
-
-
Xin Yao authored
Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Israt Nisa <neesha295@gmail.com>
-
- 28 May, 2022 2 commits
-
-
Quan (Andy) Gan authored
* change warning message * Update tensordispatch.cc
-
Quan (Andy) Gan authored
This reverts commit fdd1fe19.
-
- 16 May, 2022 1 commit
-
-
nv-dlasalle authored
* Explicitly unpin tensoradapter allocated arrays * Undo unrelated change * Add unit test * update unit test
-
- 12 May, 2022 1 commit
-
-
nv-dlasalle authored
-
- 05 Apr, 2022 1 commit
-
-
nv-dlasalle authored
[Examples] Update graphsage multi-gpu example to use mutliple GPUs for validation and testing. (#3827) * Update graphsage multi-gpu example to use mutliple GPUs for validation and testing. * Remove argmax * Fix rebase error * Add more documentation to example and simplify * Switch to name shared memory * Add comment about how training is distributed * Restore iteration count * fix munmap error reporting for better error messages Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
- 21 Feb, 2022 1 commit
-
-
Quan (Andy) Gan authored
* fixes * fix * more fixes * update * oops * lint? * temporarily revert - will fix in another PR * more fixes * skipping mxnet test * address comments * fix DDP * fix edge dataloader exclusion problems * stupid bug * fix * use_uvm option * fix * fixes * fixes * fixes * fixes * add evaluation for cluster gcn and ddp * stupid bug again * fixes * move sanity checks to only support DGLGraphs * pytorch lightning compatibility fixes * remove * poke * more fixes * fix * fix * disable test * docstrings * why is it getting a memory leak? * fix * update * updates and temporarily disable forkingpickler * update * fix? * fix? * oops * oops * fix * lint * huh * uh * update * fix * made it memory efficient * refine exclude interface * fix tutorial * fix tutorial * fix graph duplication in CPU dataloader workers * lint * lint * Revert "lint" This reverts commit 805484dd553695111b5fb37f2125214a6b7276e9. * Revert "lint" This reverts commit 0bce411b2b415c2ab770343949404498436dc8b2. * Revert "fix graph duplication in CPU dataloader workers" This reverts commit 9e3a8cf34c175d3093c773f6bb023b155f2bd27f. Co-authored-by:
xiny <xiny@nvidia.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 18 Feb, 2022 1 commit
-
-
Jinjing Zhou authored
* add * fix * fix * fix * fix * add * add * fix * fix * fix * new loader * fix * fix * fix for 3.6 * fix * add * add receipes and also some bug fixes * fix * fix * fix * fix receipies * allow AsNodeDataset to work on ogb * add ut * many fixes for nodepred-ns pipeline * receipe for nodepred-ns * Update enter/README.md Co-authored-by:
Zihao Ye <zihaoye.cs@gmail.com> * fix layers * fix * fix * fix * fix * fix multiple issues * fix for citation2 * fix comment * fix * fix * clean up * fix Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Minjie Wang <minjie.wang@nyu.edu> Co-authored-by:
Zihao Ye <zihaoye.cs@gmail.com>
-
- 09 Feb, 2022 1 commit
-
-
Xin Yao authored
* implement pin_memory/unpin_memory/is_pinned for dgl.graph * update python docstring * update c++ docstring * add test * fix the broken UnifiedTensor * XPU_SWITCH for kDLCPUPinned * a rough version ready for testing * eliminate extra context parameter for pin/unpin * update train_sampling * fix linting * fix typo * multi-gpu uva sampling case * disable new format materialization for pinned graphs * update python doc for pin_memory_ * fix unit test * UVA sampling for link prediction * dispatch most csr ops * update graphsage example to combine uva sampling and UnifiedTensor * update graphsage example to combine uva sampling and UnifiedTensor * update graphsage example to combine uva sampling and UnifiedTensor * update doc * update examples * change unitgraph and heterograph's PinMemory to in-place * update examples for multi-gpu uva sampling * update doc * fix linting * fix cpu build * fix is_pinned for DistGraph * fix is_pinned for DistGraph * update graphsage unsupervised example * update doc for gpu sampling * update some check for sampling device switching * fix linting * adapt for new dataloader * fix linting * fix * fix some name issue * adjust device check * add unit test for uva sampling & fix some zero_copy bug * fix linting * update num_threads in graphsage examples Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 21 Jan, 2022 1 commit
-
-
Xin Yao authored
* implement pin_memory/unpin_memory/is_pinned for dgl.graph * update python docstring * update c++ docstring * add test * fix the broken UnifiedTensor * eliminate extra context parameter for pin/unpin * fix linting * fix typo * disable new format materialization for pinned graphs * update python doc for pin_memory_ * fix unit test * update doc * change unitgraph and heterograph's PinMemory to in-place * update comments for NDArray's PinMemory_ and PinData * update doc Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 07 Jan, 2022 1 commit
-
-
Quan (Andy) Gan authored
* first commit * a bunch of fixes * add unique * lint * lint * lint * address comments * Update negative_sampler.py * fix * description * address comments and fix * fix * replace unique with replace * test pylint * Update negative_sampler.py
-
- 04 Jan, 2022 1 commit
-
-
Quan (Andy) Gan authored
* support shared memory on windows * Update shared_mem.cc
-
- 18 Oct, 2021 1 commit
-
-
nv-dlasalle authored
-
- 15 Oct, 2021 1 commit
-
-
David Min authored
* Add pytorch-direct version * remove * add documentation for UnifiedTensor * Revert "add documentation for UnifiedTensor" This reverts commit 63ba42644d4aba197c1cb4ea4b85fa1bc43b8849. * add boundary check for UVM IndexSelect * relocate boundary check index kernels to cuda * fix function name * fix indexkernel in nccl api * fix argument ordering * simplify code * Add a comment for the uvm version Co-authored-by:
shhssdm <shhssdm@gmail.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 14 Oct, 2021 1 commit
-
-
zexi yuan authored
* [Bugfix] fix a compile error for Debug-BuildType on Windows Platform When using CMakeLists.txt to build the "Debug" BuildType on the Windows Platform, it has three compile errors (C4716) in the file "dgl\src\runtime\shared_mem.cc": 'dgl::runtime::SharedMemory::CreateNew': must return a value 'dgl::runtime::SharedMemory::Open': must return a value 'dgl::runtime::SharedMemory::Exist': must return a value * [Bugfix] cmake error "cannot find load file" when DGL as a sub_directory on Linux When using DGL as a subdirectory in a CMake Project, the "CMAKE_SOURCE_DIR" here will return the parent cmake scope dir, which is not a expected dir. Maybe it is better to use "CMAKE_CURRENT_SOURCE_DIR" to set "GKLIB_PATH". * [Bugfix] cmd cmake error when DGL as a subdirectory When DGL as a subdirectory of another project, the WORKING_DIRECTORY of "add_custom_command" will be incorrect at the line 255 of "CMakeLists.txt", such that making a cmake "setlocal" error.
-
- 29 Sep, 2021 1 commit
-
-
Rhett Ying authored
* [Feature] enable create/set/free cuda stream for internal use * add unit test * fix unit test failure on mxnet and tf * refactor stream wrapper * fix lint error * fix lint error
-
- 28 Sep, 2021 1 commit
-
-
Jingcheng Yu authored
Co-authored-by:JingchengYu94 <jingchengyu94@gmail.com>
-
- 06 Sep, 2021 1 commit
-
-
Jinjing Zhou authored
* remove * remove * fix * remove * remove
-
- 01 Sep, 2021 1 commit
-
-
Rhett Ying authored
Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 19 Aug, 2021 1 commit
-
-
nv-dlasalle authored
* Update filter code * Add unit tests * Fixes * Switch to indices * Rename functions * Fix linting * Fix whitespace * Add doc * Fix heterograph * Change workspace allocation * Fix linting * Fix docs in filter.py * Add todo Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-
- 08 Aug, 2021 1 commit
-
-
nv-dlasalle authored
* Only link tensordispatcher against pytorch * Only modify libraries when not using MSVC
-
- 16 Jul, 2021 1 commit
-
-
David Min authored
[Feature][Performance][GPU] Introducing UnifiedTensor for efficient zero-copy host memory access from GPU (#3086) * Add pytorch-direct version * Initial commit of unified tensor * Merge branch 'master' of https://github.com/davidmin7/dgl * Remove unnecessary things * Fix error message * Fix/Add descriptions * whitespace fix * add unpin * disable IndexSelectCPUFromGPU with no CUDA * add a newline for unified_tensor.py * Apply changes based on feedback * add 'os' module * skip unified tensor unit test for cpu only * Update tests/pytorch/test_unified_tensor.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * reflect feedback Co-authored-by:
shhssdm <shhssdm@gmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 08 Jul, 2021 1 commit
-
-
Quan (Andy) Gan authored
-
- 02 Jul, 2021 1 commit
-
-
nv-dlasalle authored
* Add dgl.utils.is_sorted_srcdst * Fix linting issues * delete blank line * Specify datatype to index tensor in test * Force integer conversion Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-
- 27 Jun, 2021 1 commit
-
-
Jinjing Zhou authored
* fix * remove nvidiasmi * fix * fix docs * fix * fix * 1 * fix * remove * skip deprecated kernel * fix * Revert "skip deprecated kernel" This reverts commit c5ceb7f60dbbaf065b81cc3680757fd611d90ad3. * fix
-
- 23 Jun, 2021 1 commit
-
-
nv-dlasalle authored
Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 11 Jun, 2021 2 commits
-
-
Tomasz Patejko authored
-
nv-dlasalle authored
* Split from NCCL PR * Fix type in comment * Expand documentation for sparse_all_to_all_push * Restore previous behavior in example * Re-work optimizer to use NCCL based on gradient location * Allow for running with embedding on CPU but using NCCL for gradient exchange * Optimize single partition case * Fix pylint errors * Add missing include * fix gradient indexing * Fix line continuation * Migrate 'first_step' * Skip tests without enough GPUs to run NCCL * Improve empty tensor handling for pytorch 1.5 * Fix indentation * Allow multiple NCCL communicator to coexist * Improve handling of empty message * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Keepy empty tensor dimensionaless * th.empty -> th.tensor * Preserve shape for empty non-zero dimension tensors * Use shared state, when embedding is shared * Add support for gathering an embedding * Fix typo * Fix more typos * Fix backend call * Use NodeDataLoader to take advantage of ddp * Update training script to share memory * Only squeeze last dimension * Better handle empty message * Keep embedding on the target device GPU if dgl_sparse if false in RGCN example * Fix typo in comment * Add asserts * Improve documentation in example Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 20 May, 2021 1 commit
-
-
nv-dlasalle authored
[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825) * Split NCCL wrapper from sparse optimizer and sparse embedding * Add more unit tests for single node nccl * Fix unit test for tf * Switch to device histogram * Fix histgram issues * Finish migration to histogram * Handle cases with zero send/recieve data * Start on partition object * Get compiling * Updates * Add unit tests * Switch to partition object * Fix linting issues * Rename partition file * Add python doc * Fix python assert and finish doxygen comments * Remove stubs for range based partition to satisfy pylint * Wrap unit test in GPU only * Wrap explicit cuda call in ifdef * Merge with partition.py * update docstrings * Cleanup partition_op * Add Workspace object * Switch to using workspace object * Move last remainder based function out of nccl_api * Add error messages * Update docs with examples * Fix linting erros Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 27 Apr, 2021 1 commit
-
-
Israt Nisa authored
* init cuda support * cuSPARSE err * passed unittest for csr_mm/SpGEMM. int64 not supported * Debugging cuSPARSE error 3 * csrgeam only supports int32? * disabling int64 for cuda * refactor and add CSRMask * lint * oops * remove todo * rewrite CSRMask with CSRGetData * lint * fix test * address comments * lint * fix * addresses comments and rename BUG_ON Co-authored-by:
Israt Nisa <nisisrat@amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-30-71.ec2.internal> Co-authored-by:
Quan Gan <coin2028@hotmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 22 Mar, 2021 1 commit
-
-
nv-dlasalle authored
[Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 (#2758) * Wrap cub with prefixes and remove thrust * Using counting iterator Co-authored-by:
Zihao Ye <expye@outlook.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 09 Mar, 2021 1 commit
-
-
Tianqi Zhang (张天启) authored
* finish graph matching gpu version * use C++ shuffle * finish graph matching * fix bug * fix bug * change name and use swap * upt * fix format problem * fix format problem * stronger test * upt * upt * change python api * upt * upt * format check * upt * upt * fix bug Co-authored-by:Tong He <hetong007@gmail.com>
-
- 08 Feb, 2021 1 commit
-
-
nv-dlasalle authored
* Add start of to_block gpu implementation * Pull in more changes from 0.4.2 cuda_to_block * Move more code to IdArray * Refactor DeviceNodeMapMaker * Updates * get compiling * Integrate to_block * Fix ID allocation * Minor fixes * Cleanup cuda calls to use cuda_common * Reduce kernel calls * Lint cleanup * Expand documentation * Remove unused function * Rename variables for consistency * Add doxygen comments * Fix file extension * Remove raw asynccopy for deviceapi * Remove unused function * Fix block/tile configuration * Add cuda_device_common.cuh * Add basic hashtable * Migrate part of hashtable * Refactor to use external hashtable * Make functions members * Format hash table functions * Migrate duplicate filling * Move last function over * Refactor with cu file * lint c++ code * Move context check to C++ code * Use macro switch * Add missing files * Update docstring * update docs * Move atomic functions * Refactor hashtable * Fix linting * Expand docs * Fix mismatched argument names * Switch doxygen comments from using @param to \param Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 29 Jan, 2021 1 commit
-
-
Quan (Andy) Gan authored
-
- 28 Jan, 2021 1 commit
-
-
Zihao Ye authored
* add tvm as submodule * compilation is ok but calling fails * can call now * pack multiple modules, change names * upd * upd * upd * fix cmake * upd * upd * upd * upd * fix * relative path * upd * upd * upd * singleton * upd * trigger * fix * upd * count reducible * upd * upd * upd * upd * upd * upd * upd * upd * upd * only keep related files * upd * upd * upd * upd * lint * lint * lint * lint * pylint * upd * upd * compilation * fix * upd * upd * upd * upd * upd * upd * upd doc * refactor * fix * upd number Co-authored-by:
Zhi Lin <linzhilynn@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-42-78.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-21-156.us-east-2.compute.internal> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 27 Jan, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Add sparse embedding for dgl and update rgcn example * upd * Fix * Revert "Fix" This reverts commit 4da87cdfb8b8c3506b7fc7376cd2385ba8045c2a. * Fix * upd * upd * Fix * Add unitest and update impl * fix * Clean up rgcn example code * upd * upd * update * Fix * update score * sparse for sage * remove model sparse * upd * upd * remove global norm * revert delete model_sparse.py * update according to comments * Fix doc * upd * Fix test * upd * lint * lint * lint * upd * upd * clean up Co-authored-by:Ubuntu <ubuntu@ip-172-31-56-220.ec2.internal>
-
- 25 Jan, 2021 1 commit
-
-
Da Zheng authored
* Distributed heterograph (#3) * heterogeneous graph partition. * fix graph partition book for heterograph. * load heterograph partitions. * update DistGraphServer to support heterograph. * make DistGraph runnable for heterograph. * partition a graph and store parts with homogeneous graph structure. * update DistGraph server&client to use homogeneous graph. * shuffle node Ids based on node types. * load mag in heterograph. * fix per-node-type mapping. * balance node types. * fix for homogeneous graph * store etype for now. * fix data name. * fix a bug in example. * add profiler in rgcn. * heterogeneous RGCN. * map homogeneous node ids to hetero node ids. * fix graph partition book. * fix DistGraph. * shuffle eids. * verify eids and their mappings when loading a partition. * Id map from homogneous Ids to per-type Ids. * verify partitioned results. * add test for distributed sampler. * add mapping from per-type Ids to homogeneous Ids. * update example. * fix DistGraph. * Revert "add profiler in rgcn." This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676. * add tests for homogeneous graphs. * fix a bug. * fix test. * fix for one partition. * fix for standalone training and evaluation. * small fix. * fix two bugs. * initialize projection matrix. * small fix on RGCN. * Fix rgcn performance (#17) Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix lint. * fix. * fix test. * fix lint. * test partitions. * remove redundant test for partitioning. * remove commented code. * fix partition. * fix tests. * fix RGCN. * fix test. * fix test. * fix test. * fix. * fix a bug. * update dmlc-core. * fix. * fix rgcn. * update readme. * add comments. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal> * fix. * fix. * add div_int. * fix. * fix. * fix lint. * fix. * fix. * fix. * adjust. * move code. * handle heterograph. * return pytorch tensor in GPB. * remove some tests in example. * add to_block for distributed training. * use distributed to_block. * remove unnecessary function in DistGraph. * remove distributed to_block. * use pytorch tensor. * fix a bug in ntypes and etypes. * enable norm. * make the data loader compatible with the old format. * fix. * add comments. * fix a bug. * add test for heterograph. * support partition without reshuffle. * add test. * support partition without reshuffle. * fix. * add test. * fix bugs. * fix lint. * fix dataset. * fix for mxnet. * update docstring. * rename to floor_div * avoid exposing NodePartitionPolicy and EdgePartitionPolicy. * fix docstring. * fix error. * fixes. * fix comments. * rename. * rename. * explain IdMap. * fix docstring. * fix docstring. * update docstring. * remove the code of returning heterograph. * remove argument. * fix example. * make GraphPartitionBook an abstract class. * fix. * fix. * fix a bug. * fix a bug in example * fix a bug * reverse heterograph sampling. * temp fix. * fix lint. * Revert "temp fix." This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381. * compute norm. * Revert "reverse heterograph sampling." This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9. * fix. * move id_map.py * remove check * add more comments. * update docstring. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-62-171.ec2.internal>
-
- 14 Jan, 2021 1 commit
-
-
Quan (Andy) Gan authored
* fix munmap() using wrong parameter * rename variables Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-