"git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "44091d8b2aa9fc3e958f9340c16285dcc1bee090"
Unverified Commit bf264d00 authored by Muhammed Fatih BALIN's avatar Muhammed Fatih BALIN Committed by GitHub
Browse files

[Feature] (La)yer-Neigh(bor) sampling implementation (#4668)



* adding LABOR sampling

* add ladies and pladies samplers

* fix compile error after rebase

* add reference for ladies sampler

* Improve ladies implementation.

* weighted labor sampling initial implementation draft
fix indentation and small bug in ladies script

* importance_sampling currently doesn't work with weights

* fix weighted importance sampling

* move labor example into its own folder

* lint fixes

* Improve documentation

* remove examples from the main PR

* fix linting by not using c++17 features

* fix documentation of labor_sampler.py

* update documentation for labor.py

* reformat the labor.py file with black

* fix linting errors

* replace exception use with if

* fix typo in error comment

* fixing win64 build for ci

* fixing weighted implementation, works now.

* fix bug in the weighted case and importance_sampling==0

* address part of the reviews

* remove unused code paths from cuda

* remove unused code path from cpu side

* remove extra features of labor making use of random seed.

* fix exclude_edges bug

* remove pcg and seed logic from cpu implementation, seed logic should still work for cuda.

* minor style change

* refactor CPU implementation, take out the importance_sampling probability computation into a function.

* improve CUDAWorkspaceAllocator

* refactor importance_sampling part out to a function

* minor optimization

* fix linting issue

* Revert "remove pcg and seed logic from cpu implementation, seed logic should still work for cuda."

This reverts commit c250e07ac6d7e13f57e79e8a2c2f098d777378c2.

* Revert "remove extra features of labor making use of random seed."

This reverts commit 7f99034353080308f4783f27d9a08bea343fb796.

* fix the documentation

* disable NIDs

* improve the documentation in the code

* use the stream argument in pcg32 instead of skipping ahead t times, can discard the use of hashmap now since it is faster this way.

* fix linting issue

* address another round of reviews

* further optimize CPU LABOR sampling implementation

* fix linting error

* update the comment

* reformat

* rename and rephrase comment

* fix formatting according to new linting specs

* fix compile error due to renaming, fix linting.

* lint

* rename DGLHeteroGraph to DGLGraph to match master

* replace other occurrences of DGLHeteroGraph to DGLGraph
Co-authored-by: default avatarMuhammed Fatih BALIN <m.f.balin@gmail.com>
Co-authored-by: default avatarKaan Sancak <kaansnck@gmail.com>
Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>
parent 59f3d6e0
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
#include <cusparse.h> #include <cusparse.h>
#include <dgl/runtime/packed_func.h> #include <dgl/runtime/packed_func.h>
#include <memory>
#include <string> #include <string>
#include "../workspace_pool.h" #include "../workspace_pool.h"
...@@ -19,6 +20,53 @@ ...@@ -19,6 +20,53 @@
namespace dgl { namespace dgl {
namespace runtime { namespace runtime {
/*
How to use this class to get a nonblocking thrust execution policy that uses
DGL's memory pool and the current cuda stream
runtime::CUDAWorkspaceAllocator allocator(ctx);
const auto stream = runtime::getCurrentCUDAStream();
const auto exec_policy = thrust::cuda::par_nosync(allocator).on(stream);
now, one can pass exec_policy to thrust functions
to get an integer array of size 1000 whose lifetime is managed by unique_ptr,
use: auto int_array = allocator.alloc_unique<int>(1000); int_array.get() gives
the raw pointer.
*/
class CUDAWorkspaceAllocator {
DGLContext ctx;
public:
typedef char value_type;
void operator()(void* ptr) const {
runtime::DeviceAPI::Get(ctx)->FreeWorkspace(ctx, ptr);
}
explicit CUDAWorkspaceAllocator(DGLContext ctx) : ctx(ctx) {}
CUDAWorkspaceAllocator& operator=(const CUDAWorkspaceAllocator&) = default;
template <typename T>
std::unique_ptr<T, CUDAWorkspaceAllocator> alloc_unique(
std::size_t size) const {
return std::unique_ptr<T, CUDAWorkspaceAllocator>(
reinterpret_cast<T*>(runtime::DeviceAPI::Get(ctx)->AllocWorkspace(
ctx, sizeof(T) * size)),
*this);
}
char* allocate(std::ptrdiff_t size) const {
return reinterpret_cast<char*>(
runtime::DeviceAPI::Get(ctx)->AllocWorkspace(ctx, size));
}
void deallocate(char* ptr, std::size_t) const {
runtime::DeviceAPI::Get(ctx)->FreeWorkspace(ctx, ptr);
}
};
template <typename T> template <typename T>
inline bool is_zero(T size) { inline bool is_zero(T size) {
return size == 0; return size == 0;
......
Subproject commit 428802d1a5634f96bcd0705fab379ff0113bcf13
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment