[Feature] (La)yer-Neigh(bor) sampling implementation (#4668)

* adding LABOR sampling * add ladies and pladies samplers * fix compile error after rebase * add reference for ladies sampler * Improve ladies implementation. * weighted labor sampling initial implementation draft fix indentation and small bug in ladies script * importance_sampling currently doesn't work with weights * fix weighted importance sampling * move labor example into its own folder * lint fixes * Improve documentation * remove examples from the main PR * fix linting by not using c++17 features * fix documentation of labor_sampler.py * update documentation for labor.py * reformat the labor.py file with black * fix linting errors * replace exception use with if * fix typo in error comment * fixing win64 build for ci * fixing weighted implementation, works now. * fix bug in the weighted case and importance_sampling==0 * address part of the reviews * remove unused code paths from cuda * remove unused code path from cpu side * remove extra features of labor making use of random seed. * fix exclude_edges bug * remove pcg and seed logic from cpu implementation, seed logic should still work for cuda. * minor style change * refactor CPU implementation, take out the importance_sampling probability computation into a function. * improve CUDAWorkspaceAllocator * refactor importance_sampling part out to a function * minor optimization * fix linting issue * Revert "remove pcg and seed logic from cpu implementation, seed logic should still work for cuda." This reverts commit c250e07ac6d7e13f57e79e8a2c2f098d777378c2. * Revert "remove extra features of labor making use of random seed." This reverts commit 7f99034353080308f4783f27d9a08bea343fb796. * fix the documentation * disable NIDs * improve the documentation in the code * use the stream argument in pcg32 instead of skipping ahead t times, can discard the use of hashmap now since it is faster this way. * fix linting issue * address another round of reviews * further optimize CPU LABOR sampling implementation * fix linting error * update the comment * reformat * rename and rephrase comment * fix formatting according to new linting specs * fix compile error due to renaming, fix linting. * lint * rename DGLHeteroGraph to DGLGraph to match master * replace other occurrences of DGLHeteroGraph to DGLGraph Co-authored-by: Muhammed Fatih BALIN <m.f.balin@gmail.com> Co-authored-by: Kaan Sancak <kaansnck@gmail.com> Co-authored-by: Quan Gan <coin2028@hotmail.com>

[Feature] (La)yer-Neigh(bor) sampling implementation (#4668)
* adding LABOR sampling * add ladies and pladies samplers * fix compile error after rebase * add reference for ladies sampler * Improve ladies implementation. * weighted labor sampling initial implementation draft fix indentation and small bug in ladies script * importance_sampling currently doesn't work with weights * fix weighted importance sampling * move labor example into its own folder * lint fixes * Improve documentation * remove examples from the main PR * fix linting by not using c++17 features * fix documentation of labor_sampler.py * update documentation for labor.py * reformat the labor.py file with black * fix linting errors * replace exception use with if * fix typo in error comment * fixing win64 build for ci * fixing weighted implementation, works now. * fix bug in the weighted case and importance_sampling==0 * address part of the reviews * remove unused code paths from cuda * remove unused code path from cpu side * remove extra features of labor making use of random seed. * fix exclude_edges bug * remove pcg and seed logic from cpu implementation, seed logic should still work for cuda. * minor style change * refactor CPU implementation, take out the importance_sampling probability computation into a function. * improve CUDAWorkspaceAllocator * refactor importance_sampling part out to a function * minor optimization * fix linting issue * Revert "remove pcg and seed logic from cpu implementation, seed logic should still work for cuda." This reverts commit c250e07ac6d7e13f57e79e8a2c2f098d777378c2. * Revert "remove extra features of labor making use of random seed." This reverts commit 7f99034353080308f4783f27d9a08bea343fb796. * fix the documentation * disable NIDs * improve the documentation in the code * use the stream argument in pcg32 instead of skipping ahead t times, can discard the use of hashmap now since it is faster this way. * fix linting issue * address another round of reviews * further optimize CPU LABOR sampling implementation * fix linting error * update the comment * reformat * rename and rephrase comment * fix formatting according to new linting specs * fix compile error due to renaming, fix linting. * lint * rename DGLHeteroGraph to DGLGraph to match master * replace other occurrences of DGLHeteroGraph to DGLGraph Co-authored-by: Muhammed Fatih BALIN <m.f.balin@gmail.com> Co-authored-by: Kaan Sancak <kaansnck@gmail.com> Co-authored-by: Quan Gan <coin2028@hotmail.com>
bf264d00 · Muhammed Fatih BALIN · GitHub · 59f3d6e0 · bf264d00 · 428802d1
Unverified Commit bf264d00 authored Nov 21, 2022 by Muhammed Fatih BALIN Committed by GitHub Nov 22, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 49 additions and 0 deletions

src/runtime/cuda/cuda_common.h src/runtime/cuda/cuda_common.h +48 -0

third_party/pcg third_party/pcg +1 -0

No files found.
--- a/src/runtime/cuda/cuda_common.h
+++ b/src/runtime/cuda/cuda_common.h
@@ -12,6 +12,7 @@
 #include <cusparse.h>
 #include <dgl/runtime/packed_func.h>
+#include <memory>
 #include <string>
 #include "../workspace_pool.h"
@@ -19,6 +20,53 @@
 namespace dgl {
 namespace runtime {
+/*
+  How to use this class to get a nonblocking thrust execution policy that uses
+  DGL's memory pool and the current cuda stream
+  runtime::CUDAWorkspaceAllocator allocator(ctx);
+  const auto stream = runtime::getCurrentCUDAStream();
+  const auto exec_policy = thrust::cuda::par_nosync(allocator).on(stream);
+  now, one can pass exec_policy to thrust functions
+  to get an integer array of size 1000 whose lifetime is managed by unique_ptr,
+  use: auto int_array = allocator.alloc_unique<int>(1000); int_array.get() gives
+  the raw pointer.
+*/
+class CUDAWorkspaceAllocator {
+  DGLContext ctx;
+ public:
+  typedef char value_type;
+  void operator()(void* ptr) const {
+    runtime::DeviceAPI::Get(ctx)->FreeWorkspace(ctx, ptr);
+  }
+  explicit CUDAWorkspaceAllocator(DGLContext ctx) : ctx(ctx) {}
+  CUDAWorkspaceAllocator& operator=(const CUDAWorkspaceAllocator&) = default;
+  template <typename T>
+  std::unique_ptr<T, CUDAWorkspaceAllocator> alloc_unique(
+      std::size_t size) const {
+    return std::unique_ptr<T, CUDAWorkspaceAllocator>(
+        reinterpret_cast<T*>(runtime::DeviceAPI::Get(ctx)->AllocWorkspace(
+            ctx, sizeof(T) * size)),
+        *this);
+  }
+  char* allocate(std::ptrdiff_t size) const {
+    return reinterpret_cast<char*>(
+        runtime::DeviceAPI::Get(ctx)->AllocWorkspace(ctx, size));
+  }
+  void deallocate(char* ptr, std::size_t) const {
+    runtime::DeviceAPI::Get(ctx)->FreeWorkspace(ctx, ptr);
+  }
+};
 template <typename T>
 inline bool is_zero(T size) {
  return size == 0;

--- a/pcg @ 428802d1
+++ b/pcg @ 428802d1
+Subproject commit 428802d1a5634f96bcd0705fab379ff0113bcf13