1. 11 Jun, 2022 1 commit
  2. 08 Jun, 2022 1 commit
  3. 06 Jun, 2022 1 commit
  4. 28 May, 2022 2 commits
  5. 16 May, 2022 1 commit
  6. 12 May, 2022 1 commit
  7. 05 Apr, 2022 1 commit
    • nv-dlasalle's avatar
      [Examples] Update graphsage multi-gpu example to use mutliple GPUs for... · 27a6eb56
      nv-dlasalle authored
      
      [Examples] Update graphsage multi-gpu example to use mutliple GPUs for validation and testing. (#3827)
      
      * Update graphsage multi-gpu example to use mutliple GPUs for validation and
      testing.
      
      * Remove argmax
      
      * Fix rebase error
      
      * Add more documentation to example and simplify
      
      * Switch to name shared memory
      
      * Add comment about how training is distributed
      
      * Restore iteration count
      
      * fix munmap error reporting for better error messages
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      27a6eb56
  8. 21 Feb, 2022 1 commit
    • Quan (Andy) Gan's avatar
      [Bugfix] Bug fixes in new dataloader (#3727) · 3f138eba
      Quan (Andy) Gan authored
      
      
      * fixes
      
      * fix
      
      * more fixes
      
      * update
      
      * oops
      
      * lint?
      
      * temporarily revert - will fix in another PR
      
      * more fixes
      
      * skipping mxnet test
      
      * address comments
      
      * fix DDP
      
      * fix edge dataloader exclusion problems
      
      * stupid bug
      
      * fix
      
      * use_uvm option
      
      * fix
      
      * fixes
      
      * fixes
      
      * fixes
      
      * fixes
      
      * add evaluation for cluster gcn and ddp
      
      * stupid bug again
      
      * fixes
      
      * move sanity checks to only support DGLGraphs
      
      * pytorch lightning compatibility fixes
      
      * remove
      
      * poke
      
      * more fixes
      
      * fix
      
      * fix
      
      * disable test
      
      * docstrings
      
      * why is it getting a memory leak?
      
      * fix
      
      * update
      
      * updates and temporarily disable forkingpickler
      
      * update
      
      * fix?
      
      * fix?
      
      * oops
      
      * oops
      
      * fix
      
      * lint
      
      * huh
      
      * uh
      
      * update
      
      * fix
      
      * made it memory efficient
      
      * refine exclude interface
      
      * fix tutorial
      
      * fix tutorial
      
      * fix graph duplication in CPU dataloader workers
      
      * lint
      
      * lint
      
      * Revert "lint"
      
      This reverts commit 805484dd553695111b5fb37f2125214a6b7276e9.
      
      * Revert "lint"
      
      This reverts commit 0bce411b2b415c2ab770343949404498436dc8b2.
      
      * Revert "fix graph duplication in CPU dataloader workers"
      
      This reverts commit 9e3a8cf34c175d3093c773f6bb023b155f2bd27f.
      Co-authored-by: default avatarxiny <xiny@nvidia.com>
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      3f138eba
  9. 18 Feb, 2022 1 commit
  10. 09 Feb, 2022 1 commit
    • Xin Yao's avatar
      [Feature] CUDA UVA sampling for MultiLayerNeighborSampler (#3674) · 738e8318
      Xin Yao authored
      
      
      * implement pin_memory/unpin_memory/is_pinned for dgl.graph
      
      * update python docstring
      
      * update c++ docstring
      
      * add test
      
      * fix the broken UnifiedTensor
      
      * XPU_SWITCH for kDLCPUPinned
      
      * a rough version ready for testing
      
      * eliminate extra context parameter for pin/unpin
      
      * update train_sampling
      
      * fix linting
      
      * fix typo
      
      * multi-gpu uva sampling case
      
      * disable new format materialization for pinned graphs
      
      * update python doc for pin_memory_
      
      * fix unit test
      
      * UVA sampling for link prediction
      
      * dispatch most csr ops
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update graphsage example to combine uva sampling and UnifiedTensor
      
      * update doc
      
      * update examples
      
      * change unitgraph and heterograph's PinMemory to in-place
      
      * update examples for multi-gpu uva sampling
      
      * update doc
      
      * fix linting
      
      * fix cpu build
      
      * fix is_pinned for DistGraph
      
      * fix is_pinned for DistGraph
      
      * update graphsage unsupervised example
      
      * update doc for gpu sampling
      
      * update some check for sampling device switching
      
      * fix linting
      
      * adapt for new dataloader
      
      * fix linting
      
      * fix
      
      * fix some name issue
      
      * adjust device check
      
      * add unit test for uva sampling & fix some zero_copy bug
      
      * fix linting
      
      * update num_threads in graphsage examples
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      738e8318
  11. 21 Jan, 2022 1 commit
    • Xin Yao's avatar
      [Feature] Pin dgl.graph to the page-locked memory (#3616) · 40b44a43
      Xin Yao authored
      
      
      * implement pin_memory/unpin_memory/is_pinned for dgl.graph
      
      * update python docstring
      
      * update c++ docstring
      
      * add test
      
      * fix the broken UnifiedTensor
      
      * eliminate extra context parameter for pin/unpin
      
      * fix linting
      
      * fix typo
      
      * disable new format materialization for pinned graphs
      
      * update python doc for pin_memory_
      
      * fix unit test
      
      * update doc
      
      * change unitgraph and heterograph's PinMemory to in-place
      
      * update comments for NDArray's PinMemory_ and PinData
      
      * update doc
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      40b44a43
  12. 07 Jan, 2022 1 commit
    • Quan (Andy) Gan's avatar
      [Feature] Negative sampling (#3599) · 90f10b31
      Quan (Andy) Gan authored
      * first commit
      
      * a bunch of fixes
      
      * add unique
      
      * lint
      
      * lint
      
      * lint
      
      * address comments
      
      * Update negative_sampler.py
      
      * fix
      
      * description
      
      * address comments and fix
      
      * fix
      
      * replace unique with replace
      
      * test pylint
      
      * Update negative_sampler.py
      90f10b31
  13. 04 Jan, 2022 1 commit
  14. 18 Oct, 2021 1 commit
  15. 15 Oct, 2021 1 commit
  16. 14 Oct, 2021 1 commit
    • zexi yuan's avatar
      [Bugfix] three bugs related to using DGL as a subdirectory(third_party) of another project. (#3379) · 18863069
      zexi yuan authored
      * [Bugfix] fix a compile error for Debug-BuildType on Windows Platform
      
      When using CMakeLists.txt to build the "Debug" BuildType on the Windows Platform, it has three compile errors (C4716) in the file "dgl\src\runtime\shared_mem.cc":
      
      'dgl::runtime::SharedMemory::CreateNew': must return a value
      'dgl::runtime::SharedMemory::Open': must return a value
      'dgl::runtime::SharedMemory::Exist': must return a value
      
      * [Bugfix] cmake error "cannot find load file" when DGL as a sub_directory on Linux
      
      When using DGL as a subdirectory in a CMake Project, the "CMAKE_SOURCE_DIR" here will return the parent cmake scope dir, which is not a expected dir.
      Maybe it is better to use "CMAKE_CURRENT_SOURCE_DIR" to set "GKLIB_PATH".
      
      * [Bugfix] cmd cmake error when DGL as a subdirectory
      
      When DGL as a subdirectory of another project, the WORKING_DIRECTORY of "add_custom_command" will be incorrect at the line 255 of "CMakeLists.txt", such that making a cmake "setlocal" error.
      18863069
  17. 29 Sep, 2021 1 commit
  18. 28 Sep, 2021 1 commit
  19. 06 Sep, 2021 1 commit
  20. 01 Sep, 2021 1 commit
  21. 19 Aug, 2021 1 commit
  22. 08 Aug, 2021 1 commit
  23. 16 Jul, 2021 1 commit
  24. 08 Jul, 2021 1 commit
  25. 02 Jul, 2021 1 commit
  26. 27 Jun, 2021 1 commit
    • Jinjing Zhou's avatar
      [Build] Make nccl optional (#3056) · 9664cdff
      Jinjing Zhou authored
      * fix
      
      * remove nvidiasmi
      
      * fix
      
      * fix docs
      
      * fix
      
      * fix
      
      * 1
      
      * fix
      
      * remove
      
      * skip deprecated kernel
      
      * fix
      
      * Revert "skip deprecated kernel"
      
      This reverts commit c5ceb7f60dbbaf065b81cc3680757fd611d90ad3.
      
      * fix
      9664cdff
  27. 23 Jun, 2021 1 commit
  28. 11 Jun, 2021 2 commits
    • Tomasz Patejko's avatar
    • nv-dlasalle's avatar
      [Feature] Allow using NCCL for communication in dgl.NodeEmbedding and dgl.SparseOptimizer (#2824) · 17d604b5
      nv-dlasalle authored
      
      
      * Split from NCCL PR
      
      * Fix type in comment
      
      * Expand documentation for sparse_all_to_all_push
      
      * Restore previous behavior in example
      
      * Re-work optimizer to use NCCL based on gradient location
      
      * Allow for running with embedding on CPU but using NCCL for gradient exchange
      
      * Optimize single partition case
      
      * Fix pylint errors
      
      * Add missing include
      
      * fix gradient indexing
      
      * Fix line continuation
      
      * Migrate 'first_step'
      
      * Skip tests without enough GPUs to run NCCL
      
      * Improve empty tensor handling for pytorch 1.5
      
      * Fix indentation
      
      * Allow multiple NCCL communicator to coexist
      
      * Improve handling of empty message
      
      * Update python/dgl/nn/pytorch/sparse_emb.py
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      
      * Update python/dgl/nn/pytorch/sparse_emb.py
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      
      * Keepy empty tensor dimensionaless
      
      * th.empty -> th.tensor
      
      * Preserve shape for empty non-zero dimension tensors
      
      * Use shared state, when embedding is shared
      
      * Add support for gathering an embedding
      
      * Fix typo
      
      * Fix more typos
      
      * Fix backend call
      
      * Use NodeDataLoader to take advantage of ddp
      
      * Update training script to share memory
      
      * Only squeeze last dimension
      
      * Better handle empty message
      
      * Keep embedding on the target device GPU if dgl_sparse if false in RGCN example
      
      * Fix typo in comment
      
      * Add asserts
      
      * Improve documentation in example
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      17d604b5
  29. 20 May, 2021 1 commit
    • nv-dlasalle's avatar
      [Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings... · ae8dbe6d
      nv-dlasalle authored
      
      [Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825)
      
      * Split NCCL wrapper from sparse optimizer and sparse embedding
      
      * Add more unit tests for single node nccl
      
      * Fix unit test for tf
      
      * Switch to device histogram
      
      * Fix histgram issues
      
      * Finish migration to histogram
      
      * Handle cases with zero send/recieve data
      
      * Start on partition object
      
      * Get compiling
      
      * Updates
      
      * Add unit tests
      
      * Switch to partition object
      
      * Fix linting issues
      
      * Rename partition file
      
      * Add python doc
      
      * Fix python assert and finish doxygen comments
      
      * Remove stubs for range based partition to satisfy pylint
      
      * Wrap unit test in GPU only
      
      * Wrap explicit cuda call in ifdef
      
      * Merge with partition.py
      
      * update docstrings
      
      * Cleanup partition_op
      
      * Add Workspace object
      
      * Switch to using workspace object
      
      * Move last remainder based function out of nccl_api
      
      * Add error messages
      
      * Update docs with examples
      
      * Fix linting erros
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      ae8dbe6d
  30. 27 Apr, 2021 1 commit
  31. 22 Mar, 2021 1 commit
  32. 09 Mar, 2021 1 commit
  33. 08 Feb, 2021 1 commit
    • nv-dlasalle's avatar
      [Sampling] Implement `dgl.to_block()` for the GPU (#2339) · bc3a532f
      nv-dlasalle authored
      
      
      * Add start of to_block gpu implementation
      
      * Pull in more changes from 0.4.2 cuda_to_block
      
      * Move more code to IdArray
      
      * Refactor DeviceNodeMapMaker
      
      * Updates
      
      * get compiling
      
      * Integrate to_block
      
      * Fix ID allocation
      
      * Minor fixes
      
      * Cleanup cuda calls to use cuda_common
      
      * Reduce kernel calls
      
      * Lint cleanup
      
      * Expand documentation
      
      * Remove unused function
      
      * Rename variables for consistency
      
      * Add doxygen comments
      
      * Fix file extension
      
      * Remove raw asynccopy for deviceapi
      
      * Remove unused function
      
      * Fix block/tile configuration
      
      * Add cuda_device_common.cuh
      
      * Add basic hashtable
      
      * Migrate part of hashtable
      
      * Refactor to use external hashtable
      
      * Make functions members
      
      * Format hash table functions
      
      * Migrate duplicate filling
      
      * Move last function over
      
      * Refactor with cu file
      
      * lint c++ code
      
      * Move context check to C++ code
      
      * Use macro switch
      
      * Add missing files
      
      * Update docstring
      
      * update docs
      
      * Move atomic functions
      
      * Refactor hashtable
      
      * Fix linting
      
      * Expand docs
      
      * Fix mismatched argument names
      
      * Switch doxygen comments from using @param to \param
      Co-authored-by: default avatarJinjing Zhou <VoVAllen@users.noreply.github.com>
      Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
      bc3a532f
  34. 29 Jan, 2021 1 commit
  35. 28 Jan, 2021 1 commit
  36. 27 Jan, 2021 1 commit
    • xiang song(charlie.song)'s avatar
      [Feature] Add support for sparse embedding (#2451) · a7e941c3
      xiang song(charlie.song) authored
      
      
      * Add sparse embedding for dgl and update rgcn example
      
      * upd
      
      * Fix
      
      * Revert "Fix"
      
      This reverts commit 4da87cdfb8b8c3506b7fc7376cd2385ba8045c2a.
      
      * Fix
      
      * upd
      
      * upd
      
      * Fix
      
      * Add unitest and update impl
      
      * fix
      
      * Clean up rgcn example code
      
      * upd
      
      * upd
      
      * update
      
      * Fix
      
      * update score
      
      * sparse for sage
      
      * remove model sparse
      
      * upd
      
      * upd
      
      * remove global norm
      
      * revert delete model_sparse.py
      
      * update according to comments
      
      * Fix doc
      
      * upd
      
      * Fix test
      
      * upd
      
      * lint
      
      * lint
      
      * lint
      
      * upd
      
      * upd
      
      * clean up
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-56-220.ec2.internal>
      a7e941c3
  37. 25 Jan, 2021 1 commit
    • Da Zheng's avatar
      [Distributed] Heterogeneous graph support (#2457) · 25ac3344
      Da Zheng authored
      
      
      * Distributed heterograph (#3)
      
      * heterogeneous graph partition.
      
      * fix graph partition book for heterograph.
      
      * load heterograph partitions.
      
      * update DistGraphServer to support heterograph.
      
      * make DistGraph runnable for heterograph.
      
      * partition a graph and store parts with homogeneous graph structure.
      
      * update DistGraph server&client to use homogeneous graph.
      
      * shuffle node Ids based on node types.
      
      * load mag in heterograph.
      
      * fix per-node-type mapping.
      
      * balance node types.
      
      * fix for homogeneous graph
      
      * store etype for now.
      
      * fix data name.
      
      * fix a bug in example.
      
      * add profiler in rgcn.
      
      * heterogeneous RGCN.
      
      * map homogeneous node ids to hetero node ids.
      
      * fix graph partition book.
      
      * fix DistGraph.
      
      * shuffle eids.
      
      * verify eids and their mappings when loading a partition.
      
      * Id map from homogneous Ids to per-type Ids.
      
      * verify partitioned results.
      
      * add test for distributed sampler.
      
      * add mapping from per-type Ids to homogeneous Ids.
      
      * update example.
      
      * fix DistGraph.
      
      * Revert "add profiler in rgcn."
      
      This reverts commit 36daaed8b660933dac8f61a39faec3da2467d676.
      
      * add tests for homogeneous graphs.
      
      * fix a bug.
      
      * fix test.
      
      * fix for one partition.
      
      * fix for standalone training and evaluation.
      
      * small fix.
      
      * fix two bugs.
      
      * initialize projection matrix.
      
      * small fix on RGCN.
      
      * Fix rgcn performance (#17)
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix lint.
      
      * fix.
      
      * fix test.
      
      * fix lint.
      
      * test partitions.
      
      * remove redundant test for partitioning.
      
      * remove commented code.
      
      * fix partition.
      
      * fix tests.
      
      * fix RGCN.
      
      * fix test.
      
      * fix test.
      
      * fix test.
      
      * fix.
      
      * fix a bug.
      
      * update dmlc-core.
      
      * fix.
      
      * fix rgcn.
      
      * update readme.
      
      * add comments.
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      
      * fix.
      
      * fix.
      
      * add div_int.
      
      * fix.
      
      * fix.
      
      * fix lint.
      
      * fix.
      
      * fix.
      
      * fix.
      
      * adjust.
      
      * move code.
      
      * handle heterograph.
      
      * return pytorch tensor in GPB.
      
      * remove some tests in example.
      
      * add to_block for distributed training.
      
      * use distributed to_block.
      
      * remove unnecessary function in DistGraph.
      
      * remove distributed to_block.
      
      * use pytorch tensor.
      
      * fix a bug in ntypes and etypes.
      
      * enable norm.
      
      * make the data loader compatible with the old format.
      
      * fix.
      
      * add comments.
      
      * fix a bug.
      
      * add test for heterograph.
      
      * support partition without reshuffle.
      
      * add test.
      
      * support partition without reshuffle.
      
      * fix.
      
      * add test.
      
      * fix bugs.
      
      * fix lint.
      
      * fix dataset.
      
      * fix for mxnet.
      
      * update docstring.
      
      * rename to floor_div
      
      * avoid exposing NodePartitionPolicy and EdgePartitionPolicy.
      
      * fix docstring.
      
      * fix error.
      
      * fixes.
      
      * fix comments.
      
      * rename.
      
      * rename.
      
      * explain IdMap.
      
      * fix docstring.
      
      * fix docstring.
      
      * update docstring.
      
      * remove the code of returning heterograph.
      
      * remove argument.
      
      * fix example.
      
      * make GraphPartitionBook an abstract class.
      
      * fix.
      
      * fix.
      
      * fix a bug.
      
      * fix a bug in example
      
      * fix a bug
      
      * reverse heterograph sampling.
      
      * temp fix.
      
      * fix lint.
      
      * Revert "temp fix."
      
      This reverts commit c450717b9f578b8c48769c675f2a19d6c1e64381.
      
      * compute norm.
      
      * Revert "reverse heterograph sampling."
      
      This reverts commit bd6deb7f52998de76508f800441ff518e2fadcb9.
      
      * fix.
      
      * move id_map.py
      
      * remove check
      
      * add more comments.
      
      * update docstring.
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-62-171.ec2.internal>
      25ac3344
  38. 14 Jan, 2021 1 commit