Commit 653428bd authored by Lingfan Yu's avatar Lingfan Yu Committed by Minjie Wang
Browse files

[Feature][Kernel] DGL kernel support (#596)

* [Kernel] Minigun integration and fused kernel support (#519)

* kernel interface

* add minigun

* Add cuda build

* functors

* working on binary elewise

* binary reduce

* change kernel interface

* WIP

* wip

* fix minigun

* compile

* binary reduce kernels

* compile

* simple test passed

* more reducers

* fix thrust problem

* fix cmake

* fix cmake; add proper guard for atomic

* WIP: bcast

* WIP

* bcast kernels

* update to new minigun pass-by-value practice

* broadcasting dim

* add copy src and copy edge

* fix linking

* fix none array problem

* fix copy edge

* add device_type and device_id to backend operator

* cache csr adj, remove cache for adjmat and incmat

* custom ops in backend and pytorch impl

* change dgl-mg kernel python interface

* add id_mapping var

* clean up plus v2e spmv schedule

* spmv schedule & clean up fall back

* symbolic message and reduce func, remove bundle func

* new executors

* new backend interface for dgl kernels and pytorch impl

* minor fix

* fix

* fix docstring, comments, func names

* nodeflow

* fix message id mapping and bugs...

* pytorch test case & fix

* backward binary reduce

* fix bug

* WIP: cusparse

* change to int32 csr for cusparse workaround

* disable cusparse

* change back to int64

* broadcasting backward

* cusparse; WIP: add rev_csr

* unit test for kernels

* pytorch backward with dgl kernel

* edge softmax

* fix backward

* improve softmax

* cache edge on device

* cache mappings on device

* fix partial forward code

* cusparse done

* copy_src_sum with cusparse

* rm id getter

* reduce grad for broadcast

* copy edge reduce backward

* kernel unit test for broadcasting

* full kernel unit test

* add cpu kernels

* edge softmax unit test

* missing ref

* fix compile and small bugs

* fix bug in bcast

* Add backward both

* fix torch utests

* expose infershape

* create out tensor in python

* fix c++ lint

* [Kernel] Add GPU utest and kernel utest (#524)

* fix gpu utest

* cuda utest runnable

* temp disable test nodeflow; unified test for kernel

* cuda test kernel done

* [Kernel] Update kernel branch (#550)

* [Model] add multiprocessing training with sampling. (#484)

* reorganize sampling code.

* add multi-process training.

* speed up gcn_cv

* fix graphsage_cv.

* add new API in graph store.

* update barrier impl.

* support both local and distributed training.

* fix multiprocess train.

* fix.

* fix barrier.

* add script for loading data.

* multiprocessing sampling.

* accel training.

* replace pull with spmv for speedup.

* nodeflow copy from parent with context.

* enable GPU.

* fix a bug in graph store.

* enable multi-GPU training.

* fix lint.

* add comments.

* rename to run_store_server.py

* fix gcn_cv.

* fix a minor bug in sampler.

* handle error better in graph store.

* improve graphsage_cv for distributed mode.

* update README.

* fix.

* update.

* [Tutorial] add sampling tutorial. (#522)

* add sampling tutorial.

* add readme

* update author list.

* fix indent in the code.

* rename the file.

* update tutorial.

* fix the last API.

* update image.

* [BUGFIX] fix the problems in the sampling tutorial. (#523)

* add index.

* update.

* update tutorial.

* fix gpu utest

* cuda utest runnable

* temp disable test nodeflow; unified test for kernel

* cuda test kernel done

* Fixing typo in JTNN after interface change (#536)

* [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515)

* [Bug Fix] Fix inplace op at backend (#546)

* Fix inplace operation

* fix line seprator

* [Feature] Add batch and unbatch for immutable graph (#539)

* Add batch and unbatch for immutable graph

* fix line seprator

* fix lintr

* remove unnecessary include

* fix code review

* [BUGFix] Improve multi-processing training (#526)

* fix.

* add comment.

* remove.

* temp fix.

* initialize for shared memory.

* fix graphsage.

* fix gcn.

* add more unit tests.

* add more tests.

* avoid creating shared-memory exclusively.

* redefine remote initializer.

* improve initializer.

* fix unit test.

* fix lint.

* fix lint.

* initialize data in the graph store server properly.

* fix test.

* fix test.

* fix test.

* small fix.

* add comments.

* cleanup server.

* test graph store with a random port.

* print.

* print to stderr.

* test1

* test2

* remove comment.

* adjust the initializer signature.

* [API] update graph store API. (#549)

* add init_ndata and init_edata in DGLGraph.

* adjust SharedMemoryGraph API.

* print warning.

* fix comment.

* update example

* fix.

* fix examples.

* add unit tests.

* add comments.

* [Refactor] Immutable graph index (#543)

* WIP

* header

* WIP .cc

* WIP

* transpose

* wip

* immutable graph .h and .cc

* WIP: nodeflow.cc

* compile

* remove all tmp dl managed ctx; they caused refcount issue

* one simple test

* WIP: testing

* test_graph

* fix graph index

* fix bug in sampler; pass pytorch utest

* WIP on mxnet

* fix lint

* fix mxnet unittest w/ unfortunate workaround

* fix msvc

* fix lint

* SliceRows and test_nodeflow

* resolve reviews

* resolve reviews

* try fix win ci

* try fix win ci

* poke win ci again

* poke

* lazy multigraph flag; stackoverflow error

* revert node subgraph test

* lazy object

* try fix win build

* try fix win build

* poke ci

* fix build script

* fix compile

* add a todo

* fix reviews

* fix compile

* [Kernel] Update kernel branch (#576)

* [Model] add multiprocessing training with sampling. (#484)

* reorganize sampling code.

* add multi-process training.

* speed up gcn_cv

* fix graphsage_cv.

* add new API in graph store.

* update barrier impl.

* support both local and distributed training.

* fix multiprocess train.

* fix.

* fix barrier.

* add script for loading data.

* multiprocessing sampling.

* accel training.

* replace pull with spmv for speedup.

* nodeflow copy from parent with context.

* enable GPU.

* fix a bug in graph store.

* enable multi-GPU training.

* fix lint.

* add comments.

* rename to run_store_server.py

* fix gcn_cv.

* fix a minor bug in sampler.

* handle error better in graph store.

* improve graphsage_cv for distributed mode.

* update README.

* fix.

* update.

* [Tutorial] add sampling tutorial. (#522)

* add sampling tutorial.

* add readme

* update author list.

* fix indent in the code.

* rename the file.

* update tutorial.

* fix the last API.

* update image.

* [BUGFIX] fix the problems in the sampling tutorial. (#523)

* add index.

* update.

* update tutorial.

* fix gpu utest

* cuda utest runnable

* temp disable test nodeflow; unified test for kernel

* cuda test kernel done

* Fixing typo in JTNN after interface change (#536)

* [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515)

* [Bug Fix] Fix inplace op at backend (#546)

* Fix inplace operation

* fix line seprator

* [Feature] Add batch and unbatch for immutable graph (#539)

* Add batch and unbatch for immutable graph

* fix line seprator

* fix lintr

* remove unnecessary include

* fix code review

* [BUGFix] Improve multi-processing training (#526)

* fix.

* add comment.

* remove.

* temp fix.

* initialize for shared memory.

* fix graphsage.

* fix gcn.

* add more unit tests.

* add more tests.

* avoid creating shared-memory exclusively.

* redefine remote initializer.

* improve initializer.

* fix unit test.

* fix lint.

* fix lint.

* initialize data in the graph store server properly.

* fix test.

* fix test.

* fix test.

* small fix.

* add comments.

* cleanup server.

* test graph store with a random port.

* print.

* print to stderr.

* test1

* test2

* remove comment.

* adjust the initializer signature.

* [API] update graph store API. (#549)

* add init_ndata and init_edata in DGLGraph.

* adjust SharedMemoryGraph API.

* print warning.

* fix comment.

* update example

* fix.

* fix examples.

* add unit tests.

* add comments.

* [Refactor] Immutable graph index (#543)

* WIP

* header

* WIP .cc

* WIP

* transpose

* wip

* immutable graph .h and .cc

* WIP: nodeflow.cc

* compile

* remove all tmp dl managed ctx; they caused refcount issue

* one simple test

* WIP: testing

* test_graph

* fix graph index

* fix bug in sampler; pass pytorch utest

* WIP on mxnet

* fix lint

* fix mxnet unittest w/ unfortunate workaround

* fix msvc

* fix lint

* SliceRows and test_nodeflow

* resolve reviews

* resolve reviews

* try fix win ci

* try fix win ci

* poke win ci again

* poke

* lazy multigraph flag; stackoverflow error

* revert node subgraph test

* lazy object

* try fix win build

* try fix win build

* poke ci

* fix build script

* fix compile

* add a todo

* fix reviews

* fix compile

* all demo use python-3 (#555)

* [DEMO] Reproduce numbers of distributed training in AMLC giant graph paper (#556)

* update

* update

* update

* update num_hops

* fix bug

* update

* report numbers of distributed training in AMLC giant graph paper

* [DEMO] Remove duplicate code for sampling (#557)

* update

* update

* re-use single-machine code

* update

* use relative path

* update

* update

* update

* add __init__.py

* add __init__.py

* import sys, os

* fix typo

* update

* [Perf] Improve performance of graph store. (#554)

* fix.

* use inplace.

* move to shared memory graph store.

* fix.

* add more unit tests.

* fix.

* fix test.

* fix test.

* disable test.

* fix.

* [BUGIFX] fix a bug in edge_ids (#560)

* add test.

* fix compute.

* fix test.

* turn on test.

* fix a bug.

* add test.

* fix.

* disable test.

* [DEMO] Add Pytorch demo for distributed sampler (#562)

* update

* update

* update

* add sender

* update

* remove duplicate cpde

* [Test] Add gtest to project (#547)

* add gtest module

* add gtest

* fix

* Update CMakeLists.txt

* Update README.md

* [Perf] lazily create msg_index. (#563)

* lazily create msg_index.

* update test.

* [BUGFIX] fix bugs for running GCN on giant graphs. (#561)

* load mxnet csr.

* enable load large csr.

* fix

* fix.

* fix int overflow.

* fix test.

* [BugFix] Fix error when bfs_level = 0 in Entity Classification with RGCN (#559)

* [DEMO] Update demo of distributed sampler (#564)

* update

* update

* update demo

* add network cpp test (#565)

* Add unittest for C++ RPC (#566)

* [CI] Fix CI for cpp test (#570)

* fix CI for cpp test

* update port number

* [Docker] update docker image (#575)

* update docker image

* specify lint version

* rm torch import from unified tests

* [Kernel][Scheduler][MXNet] Scheduler for DGL kernels and MXNet backend support (#541)

* [Model] add multiprocessing training with sampling. (#484)

* reorganize sampling code.

* add multi-process training.

* speed up gcn_cv

* fix graphsage_cv.

* add new API in graph store.

* update barrier impl.

* support both local and distributed training.

* fix multiprocess train.

* fix.

* fix barrier.

* add script for loading data.

* multiprocessing sampling.

* accel training.

* replace pull with spmv for speedup.

* nodeflow copy from parent with context.

* enable GPU.

* fix a bug in graph store.

* enable multi-GPU training.

* fix lint.

* add comments.

* rename to run_store_server.py

* fix gcn_cv.

* fix a minor bug in sampler.

* handle error better in graph store.

* improve graphsage_cv for distributed mode.

* update README.

* fix.

* update.

* [Tutorial] add sampling tutorial. (#522)

* add sampling tutorial.

* add readme

* update author list.

* fix indent in the code.

* rename the file.

* update tutorial.

* fix the last API.

* update image.

* [BUGFIX] fix the problems in the sampling tutorial. (#523)

* add index.

* update.

* update tutorial.

* fix gpu utest

* cuda utest runnable

* temp disable test nodeflow; unified test for kernel

* cuda test kernel done

* edge softmax module

* WIP

* Fixing typo in JTNN after interface change (#536)

* mxnet backend support

* improve reduce grad

* add max to unittest backend

* fix kernel unittest

* [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515)

* lint

* lint

* win build

* [Bug Fix] Fix inplace op at backend (#546)

* Fix inplace operation

* fix line seprator

* [Feature] Add batch and unbatch for immutable graph (#539)

* Add batch and unbatch for immutable graph

* fix line seprator

* fix lintr

* remove unnecessary include

* fix code review

* [BUGFix] Improve multi-processing training (#526)

* fix.

* add comment.

* remove.

* temp fix.

* initialize for shared memory.

* fix graphsage.

* fix gcn.

* add more unit tests.

* add more tests.

* avoid creating shared-memory exclusively.

* redefine remote initializer.

* improve initializer.

* fix unit test.

* fix lint.

* fix lint.

* initialize data in the graph store server properly.

* fix test.

* fix test.

* fix test.

* small fix.

* add comments.

* cleanup server.

* test graph store with a random port.

* print.

* print to stderr.

* test1

* test2

* remove comment.

* adjust the initializer signature.

* try

* fix

* fix

* fix

* fix

* fix

* try

* test

* test

* test

* try

* try

* try

* test

* fix

* try gen_target

* fix gen_target

* fix msvc var_args expand issue

* fix

* [API] update graph store API. (#549)

* add init_ndata and init_edata in DGLGraph.

* adjust SharedMemoryGraph API.

* print warning.

* fix comment.

* update example

* fix.

* fix examples.

* add unit tests.

* add comments.

* [Refactor] Immutable graph index (#543)

* WIP

* header

* WIP .cc

* WIP

* transpose

* wip

* immutable graph .h and .cc

* WIP: nodeflow.cc

* compile

* remove all tmp dl managed ctx; they caused refcount issue

* one simple test

* WIP: testing

* test_graph

* fix graph index

* fix bug in sampler; pass pytorch utest

* WIP on mxnet

* fix lint

* fix mxnet unittest w/ unfortunate workaround

* fix msvc

* fix lint

* SliceRows and test_nodeflow

* resolve reviews

* resolve reviews

* try fix win ci

* try fix win ci

* poke win ci again

* poke

* lazy multigraph flag; stackoverflow error

* revert node subgraph test

* lazy object

* try fix win build

* try fix win build

* poke ci

* fix build script

* fix compile

* add a todo

* fix reviews

* fix compile

* WIP

* WIP

* all demo use python-3 (#555)

* ToImmutable and CopyTo

* [DEMO] Reproduce numbers of distributed training in AMLC giant graph paper (#556)

* update

* update

* update

* update num_hops

* fix bug

* update

* report numbers of distributed training in AMLC giant graph paper

* [DEMO] Remove duplicate code for sampling (#557)

* update

* update

* re-use single-machine code

* update

* use relative path

* update

* update

* update

* add __init__.py

* add __init__.py

* import sys, os

* fix typo

* update

* [Perf] Improve performance of graph store. (#554)

* fix.

* use inplace.

* move to shared memory graph store.

* fix.

* add more unit tests.

* fix.

* fix test.

* fix test.

* disable test.

* fix.

* [BUGIFX] fix a bug in edge_ids (#560)

* add test.

* fix compute.

* fix test.

* turn on test.

* fix a bug.

* add test.

* fix.

* disable test.

* DGLRetValue DGLContext conversion

* [DEMO] Add Pytorch demo for distributed sampler (#562)

* update

* update

* update

* add sender

* update

* remove duplicate cpde

* [Test] Add gtest to project (#547)

* add gtest module

* add gtest

* fix

* Update CMakeLists.txt

* Update README.md

* Add support to convert immutable graph to 32 bits

* [Perf] lazily create msg_index. (#563)

* lazily create msg_index.

* update test.

* fix binary reduce following new minigun template

* enable both int64 and int32 kernels

* [BUGFIX] fix bugs for running GCN on giant graphs. (#561)

* load mxnet csr.

* enable load large csr.

* fix

* fix.

* fix int overflow.

* fix test.

* new kernel interface done for CPU

* docstring

* rename & docstring

* copy reduce and backward

* [BugFix] Fix error when bfs_level = 0 in Entity Classification with RGCN (#559)

* [DEMO] Update demo of distributed sampler (#564)

* update

* update

* update demo

* adapt cuda kernels to the new interface

* add network cpp test (#565)

* fix bug

* Add unittest for C++ RPC (#566)

* [CI] Fix CI for cpp test (#570)

* fix CI for cpp test

* update port number

* [Docker] update docker image (#575)

* update docker image

* specify lint version

* rm torch import from unified tests

* remove pytorch-specific test_function

* fix unittest

* fix

* fix unittest backend bug in converting tensor to numpy array

* fix

* mxnet version

* [BUGFIX] fix for MXNet 1.5. (#552)

* remove clone.

* turn on numpy compatible.

* Revert "remove clone."

This reverts commit 17bbf76ed72ff178df6b3f35addc428048672457.

* revert format changes

* fix mxnet api name

* revert mistakes in previous revert

* roll back CI to 20190523 build

* fix unittest

* disable test_shared_mem_store.py for now

* remove mxnet/test_specialization.py

* sync win64 test script

* fix lowercase

* missing backend in gpu unit test

* transpose to get forward graph

* pass update all

* add sanity check

* passing test_specialization.py

* fix and pass test_function

* fix check

* fix pytorch softmax

* mxnet kernels

* c++ lint

* pylint

* try

* win build

* fix

* win

* ci enable gpu build

* init submodule recursively

* backend docstring

* try

* test win dev

* doc string

* disable pytorch test_nn

* try to fix windows issue

* bug fixed, revert changes

* [Test] fix CI. (#586)

* disable unit test in mxnet tutorial.

* retry socket connection.

* roll back to set_np_compat

* try to fix multi-processing test hangs when it fails.

* fix test.

* fix.

* doc string

* doc string and clean up

* missing field in ctypes

* fix node flow schedule and unit test

* rename

* pylint

* copy from parent default context

* fix unit test script

* fix

* demo bug in nodeflow gpu test

* [Kernel][Bugfix] fix nodeflow bug (#604)

* fix nodeflow bug

* remove debug code

* add build gtest option

* fix cmake; fix graph index bug in spmv.py

* remove clone

* fix div rhs grad bug

* [Kernel] Support full builtin method, edge softmax and unit tests (#605)

* add full builtin support

* unit test

* unit test backend

* edge softmax

* apply edge with builtin

* fix kernel unit test

* disable mxnet test_shared_mem_store

* gen builtin reduce

* enable mxnet gpu unittest

* revert some changes

* docstring

* add note for the hack

* [Kernel][Unittest][CI] Fix MXNet GPU CI (#607)

* update docker image for MXNet GPU CI

* force all dgl graph input and output on CPU

* fix gpu unittest

* speedup compilation

* add some comments

* lint

* add more comments

* fix as requested

* add some comments

* comment

* lint

* lint

* update pylint

* fix as requested

* lint

* lint

* lint

* docstrings of python DGL kernel entries

* disable lint warnings on arguments in kernel.py

* fix docstring in scheduler

* fix some bug in unittest; try again

* Revert "Merge branch 'kernel' of github.com:zzhang-cn/dgl into kernel"

This reverts commit 1d2299e68b004182ea6130b088de1f1122b18a49, reversing
changes made to ddc97fbf1bec2b7815c0da7c74f7ecb2f428889b.

* Revert "fix some bug in unittest; try again"

This reverts commit ddc97fbf1bec2b7815c0da7c74f7ecb2f428889b.

* more comprehensive kernel test

* remove shape check in test_specialization
parent da0c92a2
......@@ -6,6 +6,8 @@ import numpy as np
import mxnet as mx
import mxnet.ndarray as nd
import numbers
from ... import ndarray as dglnd
from ... import kernel as K
MX_VERSION = LooseVersion(mx.__version__)
# After MXNet 1.5, empty tensors aren't supprted by default.
......@@ -92,6 +94,12 @@ def ndim(input):
def context(input):
return input.context
def device_type(ctx):
return ctx.device_type
def device_id(ctx):
return ctx.device_id
def astype(input, ty):
return nd.cast(input, ty)
......@@ -164,9 +172,6 @@ def zeros_like(input):
def ones(shape, dtype, ctx):
return nd.ones(shape, dtype=dtype, ctx=ctx)
def spmm(x, y):
return nd.dot(x, y)
def unsorted_1d_segment_sum(input, seg_id, n_segs, dim):
# TODO: support other dimensions
assert dim == 0, 'MXNet only supports segment sum on first dimension'
......@@ -246,3 +251,141 @@ def zerocopy_to_numpy(arr):
def zerocopy_from_numpy(np_data):
# NOTE: not zerocopy
return nd.array(np_data, dtype=np_data.dtype)
def zerocopy_to_dgl_ndarray(arr):
return dglnd.from_dlpack(arr.to_dlpack_for_read())
def zerocopy_to_dgl_ndarray_for_write(arr):
return dglnd.from_dlpack(arr.to_dlpack_for_write())
def zerocopy_from_dgl_ndarray(arr):
return nd.from_dlpack(arr.to_dlpack())
class BinaryReduce(mx.autograd.Function):
def __init__(self, reducer, binary_op, graph, lhs, rhs, out_size, lhs_map,
rhs_map, out_map):
super(BinaryReduce, self).__init__()
self.reducer = reducer
self.binary_op = binary_op
self.graph = graph
self.lhs = lhs
self.rhs = rhs
self.out_size = out_size
self.lhs_map = lhs_map
self.rhs_map = rhs_map
self.out_map = out_map
def forward(self, lhs_data, rhs_data):
lhs_data_nd = zerocopy_to_dgl_ndarray(lhs_data)
rhs_data_nd = zerocopy_to_dgl_ndarray(rhs_data)
feat_shape = K.infer_binary_feature_shape(lhs_data_nd, rhs_data_nd)
out_data = nd.empty((self.out_size,) + feat_shape,
ctx=lhs_data.context, dtype=lhs_data.dtype)
out_data_nd = zerocopy_to_dgl_ndarray_for_write(out_data)
K.binary_op_reduce(
self.reducer, self.binary_op, self.graph, self.lhs, self.rhs,
lhs_data_nd, rhs_data_nd, out_data_nd, self.lhs_map[0],
self.rhs_map[0], self.out_map[0])
self.save_for_backward(lhs_data_nd, rhs_data_nd, out_data_nd,
feat_shape)
return out_data
def backward(self, grad_out):
lhs_data_nd, rhs_data_nd, out_data_nd, feat_shape = self.saved_tensors
grad_out_nd = zerocopy_to_dgl_ndarray(grad_out)
grad_lhs = nd.empty((lhs_data_nd.shape[0],) + feat_shape,
ctx=grad_out.context, dtype=grad_out.dtype)
K.backward_lhs_binary_op_reduce(
self.reducer, self.binary_op, self.graph, self.lhs, self.rhs,
lhs_data_nd, rhs_data_nd, out_data_nd, grad_out_nd,
zerocopy_to_dgl_ndarray_for_write(grad_lhs), self.lhs_map[1],
self.rhs_map[1], self.out_map[1])
grad_lhs = _reduce_grad(grad_lhs, lhs_data_nd.shape)
grad_rhs = nd.empty((rhs_data_nd.shape[0],) + feat_shape,
ctx=grad_out.context, dtype=grad_out.dtype)
K.backward_rhs_binary_op_reduce(
self.reducer, self.binary_op, self.graph, self.lhs, self.rhs,
lhs_data_nd, rhs_data_nd, out_data_nd, grad_out_nd,
zerocopy_to_dgl_ndarray_for_write(grad_rhs), self.lhs_map[1],
self.rhs_map[1], self.out_map[1])
grad_rhs = _reduce_grad(grad_rhs, rhs_data_nd.shape)
return grad_lhs, grad_rhs
def binary_reduce(reducer, binary_op, graph, lhs, rhs, lhs_data, rhs_data,
out_size, lhs_map, rhs_map, out_map):
func = BinaryReduce(reducer, binary_op, graph, lhs, rhs, out_size, lhs_map,
rhs_map, out_map)
return func(lhs_data, rhs_data)
class CopyReduce(mx.autograd.Function):
def __init__(self, reducer, graph, target, out_size, in_map, out_map):
super(CopyReduce, self).__init__()
self.reducer = reducer
self.graph = graph
self.target = target
self.out_size = out_size
self.in_map = in_map
self.out_map = out_map
def forward(self, in_data):
feat_shape = in_data.shape[1:]
out_data = nd.empty((self.out_size,) + feat_shape,
ctx=in_data.context, dtype=in_data.dtype)
in_data_nd = zerocopy_to_dgl_ndarray(in_data)
out_data_nd = zerocopy_to_dgl_ndarray_for_write(out_data)
K.copy_reduce(
self.reducer, self.graph, self.target, in_data_nd, out_data_nd,
self.in_map[0], self.out_map[0])
self.save_for_backward(in_data_nd, out_data_nd)
return out_data
def backward(self, grad_out):
in_data_nd, out_data_nd = self.saved_tensors
grad_out_nd = zerocopy_to_dgl_ndarray(grad_out)
grad_in = nd.empty(in_data_nd.shape, ctx=grad_out.context,
dtype=grad_out.dtype)
K.backward_copy_reduce(
self.reducer, self.graph, self.target, in_data_nd, out_data_nd,
grad_out_nd, zerocopy_to_dgl_ndarray_for_write(grad_in),
self.in_map[1], self.out_map[1])
return grad_in
def copy_reduce(reducer, graph, target, in_data, out_size, in_map, out_map):
func = CopyReduce(reducer, graph, target, out_size, in_map, out_map)
return func(in_data)
def _reduce_grad(grad, shape):
"""Reduce gradient on the broadcast dimension
If there is broadcast in forward pass, gradients need to be reduced on
broadcast dimension. This function checks the input tensor shape and
gradient shape and perform the reduction.
Parameters
----------
grad: Tensor
Gradient tensor
shape: tuple
Shape of input tensor
Returns
-------
Tensor
"""
grad_shape = grad.shape[1:]
in_shape = shape[1:]
if in_shape == grad_shape:
# no need to reduce
return grad
num_to_squeeze = len(grad_shape) - len(in_shape)
# pad in_shape
in_shape = (1,) * num_to_squeeze + in_shape
reduce_idx = np.nonzero(np.array(grad_shape) - np.array(in_shape))[0]
reduce_idx += 1 # skip batch dim
grad = grad.sum(axis=tuple(reduce_idx), keepdims=True)
return grad.reshape(shape)
......@@ -5,6 +5,9 @@ from distutils.version import LooseVersion
import torch as th
from torch.utils import dlpack
from ... import ndarray as nd
from ... import kernel as K
TH_VERSION = LooseVersion(th.__version__)
def data_type_dict():
......@@ -31,24 +34,12 @@ def get_preferred_sparse_format():
"""
return "coo"
if TH_VERSION.version[0] == 0:
def sparse_matrix(data, index, shape, force_format=False):
fmt = index[0]
if fmt != 'coo':
raise TypeError('Pytorch backend only supports COO format. But got %s.' % fmt)
# NOTE: use _sparse_coo_tensor_unsafe to avoid unnecessary boundary check
spmat = th._sparse_coo_tensor_unsafe(index[1], data, shape)
# No conversion is required.
return spmat, None
else:
# VERSION 1.0+
def sparse_matrix(data, index, shape, force_format=False):
fmt = index[0]
if fmt != 'coo':
raise TypeError('Pytorch backend only supports COO format. But got %s.' % fmt)
spmat = th.sparse_coo_tensor(index[1], data, shape)
# No conversion is required.
return spmat, None
def sparse_matrix(data, index, shape, force_format=False):
fmt = index[0]
if fmt != 'coo':
raise TypeError('Pytorch backend only supports COO format. But got %s.' % fmt)
spmat = th.sparse_coo_tensor(index[1], data, shape)
return spmat, None
def sparse_matrix_indices(spmat):
return ('coo', spmat._indices())
......@@ -68,6 +59,15 @@ def ndim(input):
def context(input):
return input.device
def device_type(ctx):
return ctx.type
def device_id(ctx):
if ctx.index is None:
return 0
else:
return ctx.index
def astype(input, ty):
return input.type(ty)
......@@ -135,18 +135,6 @@ def zeros_like(input):
def ones(shape, dtype, ctx):
return th.ones(shape, dtype=dtype, device=ctx)
def spmm(x, y):
dst, src = x._indices()
# scatter index
index = dst.view(-1, 1).expand(-1, y.shape[1])
# zero tensor to be scatter_add to
out = y.new_full((x.shape[0], y.shape[1]), 0)
# look up src features and multiply by edge features
# Note: using y[src] instead of index_select will lead to terrible
# performance in backward
feature = th.index_select(y, 0, src) * x._values().unsqueeze(-1)
return out.scatter_add(0, index, feature)
def unsorted_1d_segment_sum(input, seg_id, n_segs, dim):
y = th.zeros(n_segs, *input.shape[1:]).to(input)
seg_id = seg_id.view((-1,) + (1,) * (input.dim() - 1)).expand_as(input)
......@@ -201,3 +189,121 @@ def zerocopy_to_numpy(input):
def zerocopy_from_numpy(np_array):
return th.from_numpy(np_array)
def zerocopy_to_dgl_ndarray(input):
return nd.from_dlpack(dlpack.to_dlpack(input.contiguous()))
def zerocopy_from_dgl_ndarray(input):
return dlpack.from_dlpack(input.to_dlpack())
class BinaryReduce(th.autograd.Function):
@staticmethod
def forward(ctx, reducer, binary_op, graph, lhs, rhs, lhs_data, rhs_data,
out_size, lhs_map, rhs_map, out_map):
lhs_data_nd = zerocopy_to_dgl_ndarray(lhs_data)
rhs_data_nd = zerocopy_to_dgl_ndarray(rhs_data)
feat_shape = K.infer_binary_feature_shape(lhs_data_nd, rhs_data_nd)
out_data = lhs_data.new_empty((out_size,) + feat_shape)
out_data_nd = zerocopy_to_dgl_ndarray(out_data)
K.binary_op_reduce(
reducer, binary_op, graph, lhs, rhs, lhs_data_nd, rhs_data_nd,
out_data_nd, lhs_map[0], rhs_map[0], out_map[0])
# save_for_backward can only save variables
ctx.backward_cache = (reducer, binary_op, graph, lhs, rhs, lhs_map,
rhs_map, out_map, lhs_data_nd, rhs_data_nd,
out_data_nd, feat_shape)
return out_data
@staticmethod
def backward(ctx, grad_out):
reducer, binary_op, graph, lhs, rhs, lhs_map, rhs_map, out_map, \
lhs_data_nd, rhs_data_nd, out_data_nd, feat_shape \
= ctx.backward_cache
ctx.backward_cache = None
grad_lhs = None
grad_rhs = None
grad_out_nd = zerocopy_to_dgl_ndarray(grad_out)
if ctx.needs_input_grad[5]:
grad_lhs = grad_out.new_empty((lhs_data_nd.shape[0],) + feat_shape)
K.backward_lhs_binary_op_reduce(
reducer, binary_op, graph, lhs, rhs, lhs_data_nd, rhs_data_nd,
out_data_nd, grad_out_nd, zerocopy_to_dgl_ndarray(grad_lhs),
lhs_map[1], rhs_map[1], out_map[1])
grad_lhs = _reduce_grad(grad_lhs, lhs_data_nd.shape)
if ctx.needs_input_grad[6]:
grad_rhs = grad_out.new_empty((rhs_data_nd.shape[0],) + feat_shape)
K.backward_rhs_binary_op_reduce(
reducer, binary_op, graph, lhs, rhs, lhs_data_nd, rhs_data_nd,
out_data_nd, grad_out_nd, zerocopy_to_dgl_ndarray(grad_rhs),
lhs_map[1], rhs_map[1], out_map[1])
grad_rhs = _reduce_grad(grad_rhs, rhs_data_nd.shape)
return None, None, None, None, None, grad_lhs, grad_rhs, None, None, \
None, None
class CopyReduce(th.autograd.Function):
@staticmethod
def forward(ctx, reducer, graph, target, in_data, out_size, in_map,
out_map):
out_data = in_data.new_empty((out_size,) + in_data.shape[1:])
in_data_nd = zerocopy_to_dgl_ndarray(in_data)
out_data_nd = zerocopy_to_dgl_ndarray(out_data)
K.copy_reduce(
reducer, graph, target, in_data_nd, out_data_nd, in_map[0],
out_map[0])
# save_for_backward can only save variables
ctx.backward_cache = (reducer, graph, target, in_map, out_map,
in_data_nd, out_data_nd)
return out_data
@staticmethod
def backward(ctx, grad_out):
reducer, graph, target, in_map, out_map, in_data_nd, out_data_nd \
= ctx.backward_cache
ctx.backward_cache = None
grad_in = None
grad_out_nd = zerocopy_to_dgl_ndarray(grad_out)
if ctx.needs_input_grad[3]:
grad_in = grad_out.new_empty(in_data_nd.shape)
K.backward_copy_reduce(
reducer, graph, target, in_data_nd, out_data_nd, grad_out_nd,
zerocopy_to_dgl_ndarray(grad_in), in_map[1], out_map[1])
return None, None, None, grad_in, None, None, None
binary_reduce = BinaryReduce.apply
copy_reduce = CopyReduce.apply
def _reduce_grad(grad, shape):
"""Reduce gradient on the broadcast dimension
If there is broadcast in forward pass, gradients need to be reduced on
broadcast dimension. This function checks the input tensor shape and
gradient shape and perform the reduction.
Parameters
----------
grad: Tensor
Gradient tensor
shape: tuple
Shape of input tensor
Returns
-------
Tensor
"""
grad_shape = grad.shape[1:]
in_shape = shape[1:]
if in_shape == grad_shape:
# no need to reduce
return grad
num_to_squeeze = len(grad_shape) - len(in_shape)
# pad inshape
in_shape = (1,) * num_to_squeeze + in_shape
reduce_idx = th.nonzero(th.tensor(grad_shape) - th.tensor(in_shape))
reduce_idx += 1 # skip batch dim
grad = grad.sum(dim=tuple(reduce_idx), keepdim=True)
return grad.view(shape)
"""Built-in function base class"""
from __future__ import absolute_import
__all__ = ['BuiltinFunction', 'BundledFunction']
__all__ = ['BuiltinFunction', 'TargetCode']
class BuiltinFunction(object):
"""Base builtin function class."""
@property
def name(self):
"""Return the name of this builtin function."""
raise NotImplementedError
class BundledFunction(object):
"""A utility class that bundles multiple functions.
class TargetCode(object):
"""Code for target
Parameters
----------
fn_list : list of callable
The function list.
Note: must be consistent with the target code definition in C++ side:
src/kernel/binary_reduce_common.h
"""
def __init__(self, fn_list):
self.fn_list = fn_list
SRC = 0
DST = 1
EDGE = 2
def __call__(self, *args, **kwargs):
"""Regular computation of this builtin function
CODE2STR = {
0: "src",
1: "dst",
2: "edge",
}
This will be used when optimization is not available and should
ONLY be called by DGL framework.
"""
ret = {}
for fn in self.fn_list:
ret.update(fn(*args, **kwargs))
return ret
class BuiltinFunction(object):
"""Base builtin function class."""
@property
def name(self):
"""Return the name."""
return "bundled"
"""Return the name of this builtin function."""
raise NotImplementedError
"""Built-in message function."""
from __future__ import absolute_import
import operator
import sys
from itertools import product
from .base import BuiltinFunction
from .. import backend as F
from .base import BuiltinFunction, TargetCode
from ..runtime import ir
from ..runtime.ir import var
__all__ = ["src_mul_edge", "copy_src", "copy_edge"]
__all__ = ["src_mul_edge", "copy_src", "copy_edge", "copy_u", "copy_e"]
class MessageFunction(BuiltinFunction):
"""Base builtin message function class."""
def __call__(self, edges):
"""Regular computation of this builtin function
This will be used when optimization is not available and should
ONLY be called by DGL framework.
def _invoke(self, graph, src_frame, dst_frame, edge_frame, out_size,
src_map, dst_map, edge_map, out_map, reducer="none"):
"""Symbolic computation of this builtin function to create
runtime.executor
"""
raise NotImplementedError
......@@ -25,195 +27,223 @@ class MessageFunction(BuiltinFunction):
"""Return the name of this builtin function."""
raise NotImplementedError
def is_spmv_supported(self, g):
"""Return whether the SPMV optimization is supported."""
raise NotImplementedError
@property
def use_edge_feature(self):
"""Return true if the message function uses edge feature data."""
raise NotImplementedError
def _is_spmv_supported_edge_feat(g, field):
"""Return whether the edge feature shape supports SPMV optimization.
Only scalar feature is supported currently.
"""
feat = g.get_e_repr()[field]
shape = F.shape(feat)
return len(shape) == 1 or (len(shape) == 2 and shape[1] == 1)
class SrcMulEdgeMessageFunction(MessageFunction):
"""Class for the src_mul_edge builtin message function.
class BinaryMessageFunction(MessageFunction):
"""Class for the lhs_op_rhs builtin message function.
See Also
--------
src_mul_edge
"""
def __init__(self, mul_op, src_field, edge_field, out_field):
self.mul_op = mul_op
self.src_field = src_field
self.edge_field = edge_field
def __init__(self, binary_op, lhs, rhs, lhs_field, rhs_field, out_field):
self.binary_op = binary_op
self.lhs = lhs
self.rhs = rhs
self.lhs_field = lhs_field
self.rhs_field = rhs_field
self.out_field = out_field
def is_spmv_supported(self, g):
"""Return true if this supports SPMV optimization.
Parameters
----------
g : DGLGraph
The graph.
Returns
-------
bool
True if this supports SPMV optimization.
def _invoke(self, graph, src_frame, dst_frame, edge_frame, out_size,
src_map, dst_map, edge_map, out_map, reducer="none"):
"""Symbolic computation of builtin binary message function to create
runtime.executor
"""
return _is_spmv_supported_edge_feat(g, self.edge_field)
def __call__(self, edges):
"""Regular computation of this builtin function
This will be used when optimization is not available and should
ONLY be called by DGL framework.
"""
sdata = edges.src[self.src_field]
edata = edges.data[self.edge_field]
# Due to the different broadcasting semantics of different backends,
# we need to broadcast the sdata and edata to be of the same rank.
rank = max(F.ndim(sdata), F.ndim(edata))
sshape = F.shape(sdata)
eshape = F.shape(edata)
sdata = F.reshape(sdata, sshape + (1,) * (rank - F.ndim(sdata)))
edata = F.reshape(edata, eshape + (1,) * (rank - F.ndim(edata)))
ret = self.mul_op(sdata, edata)
return {self.out_field : ret}
graph = var.GRAPH(graph)
in_frames = (src_frame, dst_frame, edge_frame)
in_maps = (src_map, dst_map, edge_map)
lhs_data = ir.READ_COL(in_frames[self.lhs], var.STR(self.lhs_field))
rhs_data = ir.READ_COL(in_frames[self.rhs], var.STR(self.rhs_field))
lhs_map = var.MAP(in_maps[self.lhs])
rhs_map = var.MAP(in_maps[self.rhs])
out_map = var.MAP(out_map)
return ir.BINARY_REDUCE(reducer, self.binary_op, graph, self.lhs,
self.rhs, lhs_data, rhs_data, out_size,
lhs_map, rhs_map, out_map)
@property
def name(self):
return "src_mul_edge"
lhs = TargetCode.CODE2STR[self.lhs]
rhs = TargetCode.CODE2STR[self.rhs]
return "{}_{}_{}".format(lhs, self.binary_op, rhs)
@property
def use_edge_feature(self):
"""Return true if the message function uses edge feature data."""
return True
class CopySrcMessageFunction(MessageFunction):
"""Class for the copy_src builtin message function.
class CopyMessageFunction(MessageFunction):
"""Class for the copy builtin message function.
See Also
--------
copy_src
"""
def __init__(self, src_field, out_field):
self.src_field = src_field
def __init__(self, target, in_field, out_field):
self.target = target
self.in_field = in_field
self.out_field = out_field
def is_spmv_supported(self, g):
"""Return true if this supports SPMV optimization.
def _invoke(self, graph, src_frame, dst_frame, edge_frame, out_size,
src_map, dst_map, edge_map, out_map, reducer="none"):
"""Symbolic computation of builtin message function to create
runtime.executor
"""
graph = var.GRAPH(graph)
in_frames = (src_frame, dst_frame, edge_frame)
in_maps = (src_map, dst_map, edge_map)
in_data = ir.READ_COL(in_frames[self.target], var.STR(self.in_field))
in_map = var.MAP(in_maps[self.target])
out_map = var.MAP(out_map)
return ir.COPY_REDUCE(reducer, graph, self.target, in_data, out_size,
in_map, out_map)
Parameters
----------
g : DGLGraph
The graph.
@property
def name(self):
return "copy_{}".format(TargetCode.CODE2STR[self.target])
Returns
-------
bool
True if this supports SPMV optimization.
"""
return True
def __call__(self, edges):
"""Regular computation of this builtin function
def copy_u(u, out):
"""Builtin message function that computes message using source node
feature.
This will be used when optimization is not available and should
ONLY be called by DGL framework.
"""
return {self.out_field : edges.src[self.src_field]}
Parameters
----------
u : str
The source feature field.
out : str
The output message field.
@property
def name(self):
return "copy_src"
Examples
--------
>>> import dgl
>>> message_func = dgl.function.copy_u('h', 'm')
@property
def use_edge_feature(self):
"""Return true if the message function uses edge feature data."""
return False
The above example is equivalent to the following user defined function:
class CopyEdgeMessageFunction(MessageFunction):
"""Class for the copy_edge builtin message function.
>>> def message_func(edges):
>>> return {'m': edges.src['h']}
"""
return CopyMessageFunction(TargetCode.SRC, u, out)
See Also
def copy_e(e, out):
"""Builtin message function that computes message using edge feature.
Parameters
----------
e : str
The edge feature field.
out : str
The output message field.
Examples
--------
copy_edge
>>> import dgl
>>> message_func = dgl.function.copy_e('h', 'm')
The above example is equivalent to the following user defined function:
>>> def message_func(edges):
>>> return {'m': edges.data['h']}
"""
def __init__(self, edge_field=None, out_field=None):
self.edge_field = edge_field
self.out_field = out_field
return CopyMessageFunction(TargetCode.EDGE, e, out)
def is_spmv_supported(self, g):
"""Return true if this supports SPMV optimization.
Parameters
----------
g : DGLGraph
The graph.
###############################################################################
# Generate all following builtin message functions:
# u_add_v, u_sub_v, u_mul_v, u_div_v
# u_add_e, u_sub_e, u_mul_e, u_div_e
# v_add_u, v_sub_u, v_mul_u, v_div_u
# v_add_e, v_sub_e, v_mul_e, v_div_e
# e_add_u, e_sub_u, e_mul_u, e_div_u
# e_add_v, e_sub_v, e_mul_v, e_div_v
Returns
-------
bool
True if this supports SPMV optimization.
"""
# TODO: support this with e2v spmv
return False
# return _is_spmv_supported_edge_feat(g, self.edge_field)
_TARGET_MAP = {
"u": TargetCode.SRC,
"v": TargetCode.DST,
"e": TargetCode.EDGE,
}
def __call__(self, edges):
"""Regular computation of this builtin function
This will be used when optimization is not available and should
ONLY be called by DGL framework.
"""
return {self.out_field : edges.data[self.edge_field]}
def _gen_message_builtin(lhs, rhs, binary_op):
name = "{}_{}_{}".format(lhs, binary_op, rhs)
docstring = """Builtin message function that computes message by performing
binary operation {} between {} feature and {} feature.
@property
def name(self):
return "copy_edge"
Parameters
----------
{} : str
The {} feature field.
{} : str
The {} feature field.
out : str
The output message field.
@property
def use_edge_feature(self):
"""Return true if the message function uses edge feature data."""
return True
Examples
--------
>>> import dgl
>>> message_func = dgl.function.{}('h', 'h', 'm')
""".format(binary_op,
TargetCode.CODE2STR[_TARGET_MAP[lhs]],
TargetCode.CODE2STR[_TARGET_MAP[rhs]],
lhs, TargetCode.CODE2STR[_TARGET_MAP[lhs]],
rhs, TargetCode.CODE2STR[_TARGET_MAP[rhs]],
name)
def func(lhs_field, rhs_field, out):
return BinaryMessageFunction(
binary_op, _TARGET_MAP[lhs],
_TARGET_MAP[rhs], lhs_field, rhs_field, out)
func.__name__ = name
func.__doc__ = docstring
return func
def _register_builtin_message_func():
"""Register builtin message functions"""
target = ["u", "v", "e"]
for lhs, rhs in product(target, target):
if lhs != rhs:
for binary_op in ["add", "sub", "mul", "div"]:
func = _gen_message_builtin(lhs, rhs, binary_op)
setattr(sys.modules[__name__], func.__name__, func)
__all__.append(func.__name__)
_register_builtin_message_func()
##############################################################################
# For backward compatibility
def src_mul_edge(src, edge, out):
"""Builtin message function that computes message by multiplying source
node features with edge features.
"""Builtin message function that computes message by performing
binary operation mul between src feature and dst feature.
Notes
-----
This function is deprecated. Please use u_mul_e instead.
Parameters
----------
src : str
The source feature field.
edge : str
The edge feature field.
dst : str
The destination feature field.
out : str
The output message field.
Examples
--------
>>> import dgl
>>> message_func = dgl.function.src_mul_edge(src='h', edge='w', out='m')
The above example is equivalent to the following user defined function:
>>> def message_func(edges):
>>> return {'m': edges.src['h'] * edges.data['w']}
>>> message_func = dgl.function.src_mul_edge('h', 'h', 'm')
"""
return SrcMulEdgeMessageFunction(operator.mul, src, edge, out)
return getattr(sys.modules[__name__], "u_mul_e")(src, edge, out)
def copy_src(src, out):
"""Builtin message function that computes message using source node feature.
"""Builtin message function that computes message using source node
feature.
Notes
-----
This function is deprecated. Please use copy_u instead.
Parameters
----------
......@@ -225,18 +255,23 @@ def copy_src(src, out):
Examples
--------
>>> import dgl
>>> message_func = dgl.function.copy_src(src='h', out='m')
>>> message_func = dgl.function.copy_src('h', 'm')
The above example is equivalent to the following user defined function:
>>> def message_func(edges):
>>> return {'m': edges.src['h']}
"""
return CopySrcMessageFunction(src, out)
return copy_u(src, out)
def copy_edge(edge, out):
"""Builtin message function that computes message using edge feature.
Notes
-----
This function is deprecated. Please use copy_e instead.
Parameters
----------
edge : str
......@@ -247,11 +282,11 @@ def copy_edge(edge, out):
Examples
--------
>>> import dgl
>>> message_func = dgl.function.copy_edge(edge='h', out='m')
>>> message_func = dgl.function.copy_edge('h', 'm')
The above example is equivalent to the following user defined function:
>>> def message_func(edges):
>>> return {'m': edges.data['h']}
"""
return CopyEdgeMessageFunction(edge, out)
return copy_e(edge, out)
......@@ -2,19 +2,20 @@
# pylint: disable=redefined-builtin
from __future__ import absolute_import
from .. import backend as F
from .base import BuiltinFunction
import sys
from .base import BuiltinFunction, TargetCode
from ..runtime import ir
from ..runtime.ir import var
__all__ = ["sum", "max"]
class ReduceFunction(BuiltinFunction):
"""Base builtin reduce function class."""
def __call__(self, nodes):
"""Regular computation of this builtin function
This will be used when optimization is not available and should
ONLY be called by DGL framework.
def _invoke(self, graph, edge_frame, out_size, edge_map=None,
out_map=None):
"""Symbolic computation of this builtin function to create
runtime.executor
"""
raise NotImplementedError
......@@ -23,34 +24,37 @@ class ReduceFunction(BuiltinFunction):
"""Return the name of this builtin function."""
raise NotImplementedError
def is_spmv_supported(self):
"""Return whether the SPMV optimization is supported."""
raise NotImplementedError
class SimpleReduceFunction(ReduceFunction):
"""Builtin reduce function that aggregates a single field into another
single field."""
def __init__(self, name, reduce_op, msg_field, out_field):
def __init__(self, name, msg_field, out_field):
self._name = name
self.reduce_op = reduce_op
self.msg_field = msg_field
self.out_field = out_field
def is_spmv_supported(self):
"""Return whether the SPMV optimization is supported."""
# NOTE: only sum is supported right now.
return self._name == "sum"
def __call__(self, nodes):
return {self.out_field : self.reduce_op(nodes.mailbox[self.msg_field], 1)}
def _invoke(self, graph, edge_frame, out_size, edge_map=None,
out_map=None):
"""Symbolic execution of this builtin function"""
reducer = self._name
graph = var.GRAPH(graph)
edge_map = var.MAP(edge_map)
out_map = var.MAP(out_map)
edge_data = ir.READ_COL(edge_frame, var.STR(self.msg_field))
return ir.COPY_REDUCE(reducer, graph, TargetCode.EDGE, edge_data,
out_size, edge_map, out_map)
@property
def name(self):
return self._name
def sum(msg, out):
"""Builtin reduce function that aggregates messages by sum.
###############################################################################
# Generate all following reducer functions:
# sum, max, min, prod
def _gen_reduce_builtin(reducer):
docstring = """Builtin reduce function that aggregates messages by {0}.
Parameters
----------
......@@ -61,37 +65,32 @@ def sum(msg, out):
Examples
--------
>>> import dgl
>>> reduce_func = dgl.function.sum(msg='m', out='h')
>>> reduce_func = dgl.function.{0}('m', 'h')
The above example is equivalent to the following user defined function
(if using PyTorch):
>>> import torch
>>> def reduce_func(nodes):
>>> return {'h': torch.sum(nodes.mailbox['m'], dim=1)}
"""
return SimpleReduceFunction("sum", F.sum, msg, out)
>>> return {{'h': torch.{0}(nodes.mailbox['m'], dim=1)}}
""".format(reducer)
def max(msg, out):
"""Builtin reduce function that aggregates messages by max.
def func(msg, out):
return SimpleReduceFunction(reducer, msg, out)
func.__name__ = reducer
func.__doc__ = docstring
return func
Parameters
----------
msg : str
The message field.
out : str
The output node feature field.
Examples
--------
>>> import dgl
>>> reduce_func = dgl.function.max(msg='m', out='h')
__all__ = []
The above example is equivalent to the following user defined function
(if using PyTorch):
>>> import torch
>>> def reduce_func(nodes):
>>> return {'h': torch.max(nodes.mailbox['m'], dim=1)[0]}
"""
return SimpleReduceFunction("max", F.max, msg, out)
def _register_builtin_reduce_func():
"""Register builtin reduce functions"""
for reduce_op in ["max", "min", "sum", "prod"]:
builtin = _gen_reduce_builtin(reduce_op)
setattr(sys.modules[__name__], reduce_op, builtin)
__all__.append(reduce_op)
_register_builtin_reduce_func()
......@@ -3048,7 +3048,7 @@ class DGLGraph(DGLBaseGraph):
n_repr = self.get_n_repr(v)
nbatch = NodeBatch(self, v, n_repr)
n_mask = predicate(nbatch)
n_mask = F.copy_to(predicate(nbatch), F.cpu())
if is_all(nodes):
return F.nonzero_1d(n_mask)
......@@ -3121,7 +3121,7 @@ class DGLGraph(DGLBaseGraph):
edge_data = self.get_e_repr(eid)
dst_data = self.get_n_repr(v)
ebatch = EdgeBatch(self, (u, v, eid), src_data, edge_data, dst_data)
e_mask = predicate(ebatch)
e_mask = F.copy_to(predicate(ebatch), F.cpu())
if is_all(edges):
return F.nonzero_1d(e_mask)
......
......@@ -427,16 +427,16 @@ class GraphIndex(object):
utils.Index
The edge ids.
"""
key = 'edges_s%s' % order
if key not in self._cache:
if order is None:
order = ""
edge_array = _CAPI_DGLGraphEdges(self._handle, order)
src = utils.toindex(edge_array(0))
dst = utils.toindex(edge_array(1))
eid = utils.toindex(edge_array(2))
self._cache[key] = (src, dst, eid)
return self._cache[key]
if order is None:
order = ""
edge_array = _CAPI_DGLGraphEdges(self._handle, order)
src = edge_array(0)
dst = edge_array(1)
eid = edge_array(2)
src = utils.toindex(src)
dst = utils.toindex(dst)
eid = utils.toindex(eid)
return src, dst, eid
def in_degree(self, v):
"""Return the in degree of the node.
......@@ -598,8 +598,38 @@ class GraphIndex(object):
else:
raise Exception("unknown format")
@utils.cached_member(cache='_cache', prefix='immu_gidx')
def get_immutable_gidx(self, ctx):
"""Create an immutable graph index and copy to the given device context.
Note: this internal function is for DGL scheduler use only
Parameters
----------
ctx : DGLContext
The context of the returned graph.
Returns
-------
GraphIndex
"""
return self.to_immutable().asbits(self.bits_needed()).copy_to(ctx)
def get_csr_shuffle_order(self):
"""Return the edge shuffling order when a coo graph is converted to csr format
Returns
-------
tuple of two utils.Index
The first element of the tuple is the shuffle order for outward graph
The second element of the tuple is the shuffle order for inward graph
"""
csr = _CAPI_DGLGraphGetAdj(self._handle, True, "csr")
order = csr(2)
rev_csr = _CAPI_DGLGraphGetAdj(self._handle, False, "csr")
rev_order = rev_csr(2)
return utils.toindex(order), utils.toindex(rev_order)
@utils.cached_member(cache='_cache', prefix='adj')
def adjacency_matrix(self, transpose, ctx):
"""Return the adjacency matrix representation of this graph.
......@@ -650,7 +680,6 @@ class GraphIndex(object):
else:
raise Exception("unknown format")
@utils.cached_member(cache='_cache', prefix='inc')
def incidence_matrix(self, typestr, ctx):
"""Return the incidence matrix representation of this graph.
......@@ -761,6 +790,86 @@ class GraphIndex(object):
handle = _CAPI_DGLGraphLineGraph(self._handle, backtracking)
return GraphIndex(handle)
def to_immutable(self):
"""Convert this graph index to an immutable one.
Returns
-------
GraphIndex
An immutable graph index.
"""
handle = _CAPI_DGLToImmutable(self._handle)
return GraphIndex(handle)
def ctx(self):
"""Return the context of this graph index.
Returns
-------
DGLContext
The context of the graph.
"""
return _CAPI_DGLGraphContext(self._handle)
def copy_to(self, ctx):
"""Copy this immutable graph index to the given device context.
NOTE: this method only works for immutable graph index
Parameters
----------
ctx : DGLContext
The target device context.
Returns
-------
GraphIndex
The graph index on the given device context.
"""
handle = _CAPI_DGLImmutableGraphCopyTo(self._handle, ctx.device_type, ctx.device_id)
return GraphIndex(handle)
def nbits(self):
"""Return the number of integer bits used in the storage (32 or 64).
Returns
-------
int
The number of bits.
"""
return _CAPI_DGLGraphNumBits(self._handle)
def bits_needed(self):
"""Return the number of integer bits needed to represent the graph
Returns
-------
int
The number of bits needed
"""
if self.number_of_edges() >= 0x80000000 or self.number_of_nodes() >= 0x80000000:
return 64
else:
return 32
def asbits(self, bits):
"""Transform the graph to a new one with the given number of bits storage.
NOTE: this method only works for immutable graph index
Parameters
----------
bits : int
The number of integer bits (32 or 64)
Returns
-------
GraphIndex
The graph index stored using the given number of bits.
"""
handle = _CAPI_DGLImmutableGraphAsNumBits(self._handle, int(bits))
return GraphIndex(handle)
class SubgraphIndex(GraphIndex):
"""Graph index for subgraph.
......
"""Module for dgl kernels for graph computation."""
from __future__ import absolute_import
from ._ffi.function import _init_api
from .ndarray import empty
def infer_binary_feature_shape(lhs, rhs):
"""Infer the output feature shape after a binary operation between lhs and rhs.
Parameter
---------
lhs : dgl.ndarray.NDArray
The lhs tensor.
rhs : dgl.ndarray.NDArray
The rhs tensor.
Returns
-------
tuple of int
The output feature shape.
"""
ret = _CAPI_DGLKernelInferBinaryFeatureShape(lhs, rhs)
return tuple(ret.asnumpy())
# pylint: disable=invalid-name
def binary_op_reduce(reducer, op, G, A_target, B_target, A, B, out,
A_rows=None, B_rows=None, out_rows=None):
"""Perform binary operation on the edges of graph ``G``, and optionally
reduce the per-edge result by edge destinations into per-node result.
Details
-------
Concretely, this function could be decomposed into two steps:
1. Perform binary operations on each edge (u, v, e) on graph ``G`` as
follows,::
C[e] = A[select_A_target(u, v, e)] op B[select_B_target(u, v, e)]
where
* ``select_A_target`` and ``select_B_target`` would return the source
node ID, destination node ID, or edge ID, according to ``A_target``
and ``B_target`` which could take either
- "source" (0),
- "destination" (1), or
- "edge" (2).
* ``A`` and ``B`` are data tensors. If ``A_target`` is "edge", then
``A.shape[0]`` should equal the number of edges of ``G``. Otherwise
that should equal the number of nodes of ``G``. Similar constraints
apply for ``B``.
* ``op`` could be either of the following strings: "add", "mul", "sub",
"div".
2. Perform the optional reduction step on ``C`` computed previously.
* If ``reducer`` is None, then no reduction is performed and we return
the per-edge result ``C`` directly,::
out[e] = C[e]
* Otherwise, the per-edge result ``C`` is reduced into per-node result
according to edge destinations, in a similar fashion as
``unsorted_segment_XXX`` in Tensorflow or ``scatter_XXX`` in PyTorch
or PyTorch-Scatter. For all ``v`` that has incoming edges,::
out[v] = reducer_{e: (u, v, e) in G} C[e]
Broadcasting
------------
Broadcasting is supported on the feature dimensions, following numpy
semantics.
Examples::
A.shape = (N, D1, D2) # N is the number of nodes
B.shape = (M, D1, 1) # M is the number of edges
C = BinaryOpReduce("sum", "add", graph, A, B, ...)
C.shape = (N, D1, D2)
Partial reads/writes
--------------------
Optionally, one can provide which rows to read from ``A`` and ``B`` with
``A_rows`` and ``B_rows``, both of which are 1D integer arrays. Similarly,
one can provide which rows to write to ``out`` with ``out_rows``, which is
again a 1D integer array. Concretely,
* Instead of from ``A`` and ``B``, ``C`` would be computed from
``A[A_rows]`` and ``B[B_rows]``. This implies that
* ``A`` and ``B`` no longer need to have the same number of rows as
the number of nodes or edges in ``G``. However, ``A_rows`` and
``B_rows`` must have the same number of elements as the number of
nodes or edges in ``G``.
* Instead of directly writing to ``out``, it will selectively write some
rows of ``C`` or reduced ``C``,::
out[out_rows[i]] = C[i] if out_rows[i] != -1
Or
out[out_rows[i]] = reducer_{e: (u, v, e) in G} C[e]
Parameters
----------
reducer : str
The type of the reducer ("sum", "max", "min", "mean", "prod", "none").
If the reducer is "none", the output is an edge feature tensor.
Otherwise, a node feature tensor is returned.
op : str
The type of the binary functor ("add", "mul", "sub", "div").
G : GraphIndex
The graph
A_target : int
Choice of source, destination, or edge ID for edges on left operand
B_target : int
Choice of source, destination, or edge ID for edges on right operand
A : NDArray
Data tensor of left operand
B : NDArray
Data tensor of right operand
out : NDArray (output)
Output tensor. The result will be written there in place.
A_rows : NDArray, optional
The rows to read from A.
B_rows : NDArray, optional
The rows to read from B.
out_rows : NDArray
The rows to write to output tensor.
"""
if A_rows is None:
A_rows = empty([])
if B_rows is None:
B_rows = empty([])
if out_rows is None:
out_rows = empty([])
_CAPI_DGLKernelBinaryOpReduce(
reducer, op, G._handle,
int(A_target), int(B_target),
A, B, out,
A_rows, B_rows, out_rows)
# pylint: disable=invalid-name
def backward_lhs_binary_op_reduce(
reducer, op, G,
A_target, B_target,
A, B, out,
grad_out, grad_A,
A_rows=None, B_rows=None, out_rows=None):
"""Compute the gradient of ``binary_op_reduce`` w.r.t. ``A`` and store it
in ``grad_A``.
See ``binary_op_reduce`` for forward propagation and partial reads/writes.
Gradient of broadcasted tensors
-------------------------------
``grad_A`` is assumed to be unbroadcasted, i.e. the shape of ``grad_A``
is the same as ``grad_out`` except the first axis.
If broadcasting happened in forward propagation, one needs to manually
sum the gradients along the broadcasted dimension to yield the correct
gradient.
Parameter
---------
reducer : str
The type of the reducer ("sum", "max", "min", "mean", "prod", "none").
If the reducer is "none", the output is an edge feature tensor.
Otherwise, a node feature tensor is returned.
op : str
The type of the binary functor ("add", "mul", "sub", "div").
G : GraphIndex
The graph
A_target : int
Choice of source, destination, or edge ID for edges on left operand
B_target : int
Choice of source, destination, or edge ID for edges on right operand
A : NDArray
Data tensor of left operand
B : NDArray
Data tensor of right operand
out : NDArray
Output tensor computed in the forward pass.
grad_out : NDArray
Gradient w.r.t. ``out``.
grad_A : NDArray (output)
Gradient w.r.t. ``A``. The result will be written there in place.
A_rows : NDArray, optional
The rows read from A.
B_rows : NDArray, optional
The rows read from B.
out_rows : NDArray
The rows written to output tensor.
"""
if A_rows is None:
A_rows = empty([])
if B_rows is None:
B_rows = empty([])
if out_rows is None:
out_rows = empty([])
_CAPI_DGLKernelBackwardLhsBinaryOpReduce(
reducer, op, G._handle,
int(A_target), int(B_target),
A_rows, B_rows, out_rows,
A, B, out,
grad_out, grad_A)
# pylint: disable=invalid-name
def backward_rhs_binary_op_reduce(
reducer, op, G,
A_target, B_target,
A, B, out,
grad_out, grad_B,
A_rows=None, B_rows=None, out_rows=None):
"""Compute the gradient of ``binary_op_reduce`` w.r.t. ``B`` and store it
in ``grad_B``.
See ``binary_op_reduce`` for forward propagation and partial reads/writes.
Gradient of broadcasted tensors
-------------------------------
``grad_B`` is assumed to be unbroadcasted, i.e. the shape of ``grad_B``
is the same as ``grad_out`` except the first axis.
If broadcasting happened in forward propagation, one needs to manually
sum the gradients along the broadcasted dimension to yield the correct
gradient.
Parameter
---------
reducer : str
The type of the reducer ("sum", "max", "min", "mean", "prod", "none").
If the reducer is "none", the output is an edge feature tensor.
Otherwise, a node feature tensor is returned.
op : str
The type of the binary functor ("add", "mul", "sub", "div").
G : GraphIndex
The graph
A_target : int
Choice of source, destination, or edge ID for edges on left operand
B_target : int
Choice of source, destination, or edge ID for edges on right operand
A : NDArray
Data tensor of left operand
B : NDArray
Data tensor of right operand
out : NDArray
Output tensor computed in the forward pass.
grad_out : NDArray
Gradient w.r.t. ``out``.
grad_B : NDArray (output)
Gradient w.r.t. ``B``. The result will be written there in place.
A_rows : NDArray, optional
The rows read from A.
B_rows : NDArray, optional
The rows read from B.
out_rows : NDArray
The rows written to output tensor.
"""
if A_rows is None:
A_rows = empty([])
if B_rows is None:
B_rows = empty([])
if out_rows is None:
out_rows = empty([])
_CAPI_DGLKernelBackwardRhsBinaryOpReduce(
reducer, op, G._handle,
int(A_target), int(B_target),
A_rows, B_rows, out_rows,
A, B, out,
grad_out, grad_B)
# pylint: disable=invalid-name
def copy_reduce(reducer, G, target,
X, out,
X_rows=None, out_rows=None):
"""Copy data in ``X`` according to source/destination/edge ID onto the
edges of graph ``G``, and optionally reduce the per-edge result by edge
destinations into per-node result.
Details
-------
Concretely, this function could be decomposed into two steps:
1. For each edge (u, v, e) on graph ``G``, set
C[e] = X[select_target(u, v, e)]
where
* ``select_target`` would return the source node ID, destination node,
ID, or edge ID, according to ``target`` which could take either
- "source" (0),
- "destination" (1), or
- "edge" (2)
* ``X`` is a data tensor. If ``target`` is "edge", then ``X.shape[0]``
should equal the number of edges of ``G``. Otherwise that should
equal the number of nodes of ``G``.
2. Perform the optional reduction step on ``C`` computed previously.
* If ``reducer`` is None, then no reduction is performed and we return
the per-edge result ``C`` directly,::
out[e] = C[e]
* Otherwise, the per-edge result ``C`` is reduced into per-node result
according to edge destinations, in a similar fashion as
``unsorted_segment_XXX`` in Tensorflow or ``scatter_XXX`` in PyTorch
or PyTorch-Scatter. For all ``v`` that has incoming edges,::
out[v] = reducer_{e: (u, v, e) in G} C[e]
Partial reads/writes
--------------------
Optionally, one can provide which rows to read from ``X`` with ``X_rows``,
which is a 1D integer array. Similarly, one can provide which rows to
write to ``out`` with ``out_rows``, which is again a 1D integer array.
Concretely,
* Instead of from ``X``, ``C`` would be copied from ``X[X_rows]``. This
implies that
* ``X`` no longer needs to have the same number of rows as the number of
nodes or edges in ``G``. However, ``X_rows`` must have the same
number of elements as the number of nodes or edges in ``G``.
* Instead of directly writing to ``out``, it will selectively write some
rows of ``C`` or reduced ``C``,::
out[out_rows[i]] = C[i] if out_rows[i] != -1
Or
out[out_rows[i]] = reducer_{e: (u, v, e) in G} C[e]
Parameter
---------
reducer : str
The type of the reducer ("sum", "max", "min", "mean", "prod", "none").
If the reducer is "none", the output is an edge feature tensor.
Otherwise, a node feature tensor is returned.
graph : GraphIndex
The graph
target : int
Choice of source, destination, or edge ID for edges to index in data
tensor.
X : NDArray
Data tensor.
out : NDArray (output)
Output tensor. The result will be written there in place.
X_rows : NDArray, optional
The rows to read from X.
out_mapping : NDArray
The rows to write to output tensor.
"""
if X_rows is None:
X_rows = empty([])
if out_rows is None:
out_rows = empty([])
_CAPI_DGLKernelCopyReduce(
reducer, G._handle, int(target),
X, out, X_rows, out_rows)
# pylint: disable=invalid-name
def backward_copy_reduce(reducer, G, target,
X, out,
grad_out, grad_X,
X_rows=None, out_rows=None):
"""Compute the gradient of ``copy_reduce`` w.r.t. ``X`` and store it in
``grad_X``.
See ``copy_reduce`` for forward propagation and partial reads/writes.
Parameter
---------
reducer : str
The type of the reducer ("sum", "max", "min", "mean", "prod", "none").
If the reducer is "none", the output is an edge feature tensor.
Otherwise, a node feature tensor is returned.
G : GraphIndex
The graph
target : int
Choice of source, destination, or edge ID for edges to index in data
tensor.
X : NDArray
Data tensor.
out : NDArray
Output tensor computed in the forward pass.
grad_out_data : NDArray
Gradient w.r.t. ``out``.
grad_X : NDArray (output)
Gradient w.r.t. ``X``. The result will be written there in place.
X_rows : NDArray, optional
The rows read from X.
out_rows : NDArray
The rows written to output tensor.
"""
if X_rows is None:
X_rows = empty([])
if out_rows is None:
out_rows = empty([])
_CAPI_DGLKernelBackwardCopyReduce(
reducer, G._handle, int(target),
X, out, grad_out, grad_X,
X_rows, out_rows)
_init_api("dgl.kernel")
"""Torch modules for graph related softmax."""
# pylint: disable= no-member, arguments-differ
import torch as th
from torch import nn
from ... import backend as F
from ... import utils
from ... import function as fn
from ...utils import get_ndata_name
__all__ = ['EdgeSoftmax']
__all__ = ['EdgeSoftmax', 'edge_softmax']
class EdgeSoftmax(nn.Module):
class EdgeSoftmax(object):
r"""Apply softmax over signals of incoming edges.
For a node :math:`i`, edgesoftmax is an operation of computing
......@@ -24,22 +25,16 @@ class EdgeSoftmax(nn.Module):
`Graph Attention Network <https://arxiv.org/pdf/1710.10903.pdf>`__ where
the attention weights are computed with such an edgesoftmax operation.
"""
def __init__(self):
super(EdgeSoftmax, self).__init__()
# compute the softmax
self._logits_name = "_logits"
self._max_logits_name = "_max_logits"
self._normalizer_name = "_norm"
def forward(self, logits, graph):
def __call__(self, graph, logits):
r"""Compute edge softmax.
Parameters
----------
graph : DGLGraph
The graph to perform edge softmax
logits : torch.Tensor
The input edge feature
graph : DGLGraph
The graph.
Returns
-------
......@@ -50,46 +45,89 @@ class EdgeSoftmax(nn.Module):
Notes
-----
* Input shape: :math:`(N, *, 1)` where * means any number of additional
dimensions, :math:`N` is the number of edges.
* Unnormalized scores shape: :math:`(N, *, 1)` where all but the last
dimension are the same shape as the input.
* Normalizer shape: :math:`(M, *, 1)` where :math:`M` is the number of
nodes and all but the first and the last dimensions are the same as
the input.
* Input shape: :math:`(N, *, 1)` where * means any number of
additional dimensions, :math:`N` is the number of edges.
* Unnormalized scores shape: :math:`(N, *, 1)` where all but the
last dimension are the same shape as the input.
* Normalizer shape: :math:`(M, *, 1)` where :math:`M` is the number
of nodes and all but the first and the last dimensions are the
same as the input.
Note that this computation is still one step away from getting real softmax
results. The last step can be proceeded as follows:
Note that this computation is still one step away from getting real
softmax results. The last step can be proceeded as follows:
>>> import dgl.function as fn
>>>
>>> scores, normalizer = EdgeSoftmax(...).forward(logits, graph)
>>> scores, normalizer = EdgeSoftmax(logits, graph)
>>> graph.edata['a'] = scores
>>> graph.ndata['normalizer'] = normalizer
>>> graph.apply_edges(lambda edges : {'a' : edges.data['a'] / edges.dst['normalizer']})
>>> graph.apply_edges(
lambda edges: {'a': edges.data['a'] / edges.dst['normalizer']})
We left this last step to users as depending on the particular use case,
this step can be combined with other computation at once.
We left this last step to users as depending on the particular use
case, this step can be combined with other computation at once.
"""
num_nodes = graph.number_of_nodes()
ctx = utils.to_dgl_context(F.context(logits))
gidx = graph._graph.get_immutable_gidx(ctx)
_, dst, _ = graph._graph.edges()
dst = dst.tousertensor(F.context(logits))
empty_map = (None, None)
max_logits_ = F.copy_reduce("max", gidx, fn.TargetCode.EDGE, logits,
num_nodes, empty_map, empty_map)
logits = (logits - max_logits_.index_select(0, dst)).exp()
norm = F.copy_reduce("sum", gidx, fn.TargetCode.EDGE, logits,
num_nodes, empty_map, empty_map)
return logits / norm.index_select(0, dst)
class EdgeSoftmax1(th.autograd.Function):
"""EdgeSoftmax implementation with DGL message passing APIs"""
@staticmethod
def forward(ctx, g, score):
"""
score = dgl.EData(g, score)
score_max = score.dst_max() # of type dgl.NData
score = score - score_max # edge_sub_dst, ret dgl.EData
score_sum = score.dst_sum() # of type dgl.NData
out = score / score_sum # edge_div_dst, ret dgl.EData
return out.data
"""
score_name = utils.get_edata_name(g, 'score')
tmp_name = utils.get_ndata_name(g, 'tmp')
out_name = utils.get_edata_name(g, 'out')
g.edata[score_name] = score
g.update_all(fn.copy_e(score_name, 'm'), fn.max('m', tmp_name))
g.apply_edges(fn.e_sub_v(score_name, tmp_name, out_name))
g.edata[out_name] = th.exp(g.edata[out_name])
g.update_all(fn.copy_e(out_name, 'm'), fn.sum('m', tmp_name))
g.apply_edges(fn.e_div_v(out_name, tmp_name, out_name))
g.edata.pop(score_name)
g.ndata.pop(tmp_name)
out = g.edata.pop(out_name)
ctx.backward_cache = (g, out)
return out
@staticmethod
def backward(ctx, grad_out):
"""
g, out = ctx.backward_cache
grad_out = dgl.EData(g, grad_out)
out = dgl.EData(g, out)
sds = out * grad_out # type dgl.EData
sds_sum = sds.dst_sum() # type dgl.NData
grad_score = sds - sds * sds_sum # multiple expressions
return grad_score.data
"""
self._logits_name = get_ndata_name(graph, self._logits_name)
self._max_logits_name = get_ndata_name(graph, self._max_logits_name)
self._normalizer_name = get_ndata_name(graph, self._normalizer_name)
graph.edata[self._logits_name] = logits
# compute the softmax
graph.update_all(fn.copy_edge(self._logits_name, self._logits_name),
fn.max(self._logits_name, self._max_logits_name))
# minus the max and exp
graph.apply_edges(
lambda edges: {self._logits_name : th.exp(edges.data[self._logits_name] -
edges.dst[self._max_logits_name])})
# pop out temporary feature _max_logits, otherwise get_ndata_name could have huge overhead
graph.ndata.pop(self._max_logits_name)
# compute normalizer
graph.update_all(fn.copy_edge(self._logits_name, self._logits_name),
fn.sum(self._logits_name, self._normalizer_name))
return graph.edata.pop(self._logits_name), graph.ndata.pop(self._normalizer_name)
def __repr__(self):
return 'EdgeSoftmax()'
g, out = ctx.backward_cache
out_name = utils.get_edata_name(g, 'out')
accum_name = utils.get_ndata_name(g, 'accum')
grad_score_name = utils.get_edata_name(g, 'grad_score')
g.edata[out_name] = out
g.edata[grad_score_name] = out * grad_out
g.update_all(fn.copy_e(grad_score_name, 'm'), fn.sum('m', accum_name))
g.apply_edges(fn.e_mul_v(out_name, accum_name, out_name))
grad_score = g.edata[grad_score_name] - g.edata[out_name]
return None, grad_score
edge_softmax = EdgeSoftmax1.apply # pylint: disable=invalid-name
......@@ -152,7 +152,7 @@ class NodeFlow(DGLBaseGraph):
block_id = self._get_block_id(block_id)
return int(self._block_offsets[block_id + 1]) - int(self._block_offsets[block_id])
def copy_from_parent(self, node_embed_names=ALL, edge_embed_names=ALL, ctx=F.cpu()):
def copy_from_parent(self, node_embed_names=ALL, edge_embed_names=ALL, ctx=None):
"""Copy node/edge features from the parent graph.
Parameters
......@@ -161,6 +161,8 @@ class NodeFlow(DGLBaseGraph):
The names of embeddings in each layer.
edge_embed_names : a list of lists of strings, optional
The names of embeddings in each block.
ctx : Context
The device to copy tensor to. If None, features will stay at its original device
"""
if self._parent._node_frame.num_rows != 0 and self._parent._node_frame.num_columns != 0:
if is_all(node_embed_names):
......@@ -244,7 +246,8 @@ class NodeFlow(DGLBaseGraph):
Tensor
The parent node id array.
"""
return self._node_mapping.tousertensor()[nid]
nid = utils.toindex(nid)
return self._node_mapping.tousertensor()[nid.tousertensor()]
def map_to_parent_eid(self, eid):
"""This maps the child edge Ids to the parent Ids.
......@@ -259,7 +262,8 @@ class NodeFlow(DGLBaseGraph):
Tensor
The parent edge id array.
"""
return self._edge_mapping.tousertensor()[eid]
eid = utils.toindex(eid)
return self._edge_mapping.tousertensor()[eid.tousertensor()]
def map_from_parent_nid(self, layer_id, parent_nids):
"""Map parent node Ids to NodeFlow node Ids in a certain layer.
......@@ -398,13 +402,18 @@ class NodeFlow(DGLBaseGraph):
assert F.asnumpy(F.sum(ret == -1, 0)) == 0, "The eid in the parent graph is invalid."
return ret
def block_edges(self, block_id):
def block_edges(self, block_id, remap=False):
"""Return the edges in a block.
If remap is True, returned indices u, v, eid will be remapped to local
indices (i.e. starting from 0)
Parameters
----------
block_id : int
The specified block to return the edges.
remap : boolean
Remap indices if True
Returns
-------
......@@ -420,7 +429,8 @@ class NodeFlow(DGLBaseGraph):
rst = _CAPI_NodeFlowGetBlockAdj(self._graph._handle, "coo",
int(layer0_size),
int(self._layer_offsets[block_id + 1]),
int(self._layer_offsets[block_id + 2]))
int(self._layer_offsets[block_id + 2]),
remap)
idx = utils.toindex(rst(0)).tousertensor()
eid = utils.toindex(rst(1))
num_edges = int(len(idx) / 2)
......@@ -455,7 +465,8 @@ class NodeFlow(DGLBaseGraph):
rst = _CAPI_NodeFlowGetBlockAdj(self._graph._handle, fmt,
int(layer0_size),
int(self._layer_offsets[block_id + 1]),
int(self._layer_offsets[block_id + 2]))
int(self._layer_offsets[block_id + 2]),
True)
num_rows = self.layer_size(block_id + 1)
num_cols = self.layer_size(block_id)
......@@ -515,7 +526,7 @@ class NodeFlow(DGLBaseGraph):
if shuffle is not required.
"""
block_id = self._get_block_id(block_id)
src, dst, eid = self.block_edges(block_id)
src, dst, eid = self.block_edges(block_id, remap=True)
src = F.copy_to(src, ctx) # the index of the ctx will be cached
dst = F.copy_to(dst, ctx) # the index of the ctx will be cached
eid = F.copy_to(eid, ctx) # the index of the ctx will be cached
......@@ -740,7 +751,7 @@ class NodeFlow(DGLBaseGraph):
assert func is not None
if is_all(edges):
u, v, _ = self.block_edges(block_id)
u, v, _ = self.block_edges(block_id, remap=True)
u = utils.toindex(u)
v = utils.toindex(v)
eid = utils.toindex(slice(0, self.block_size(block_id)))
......@@ -827,8 +838,8 @@ class NodeFlow(DGLBaseGraph):
assert len(u) > 0, "block_compute must run on edges"
u = utils.toindex(self._glb2lcl_nid(u.tousertensor(), block_id))
v = utils.toindex(self._glb2lcl_nid(v.tousertensor(), block_id + 1))
dest_nodes = utils.toindex(self._glb2lcl_nid(dest_nodes.tousertensor(),
block_id + 1))
dest_nodes = utils.toindex(
self._glb2lcl_nid(dest_nodes.tousertensor(), block_id + 1))
eid = utils.toindex(self._glb2lcl_eid(eid.tousertensor(), block_id))
with ir.prog() as prog:
......@@ -909,15 +920,22 @@ def _copy_to_like(arr1, arr2):
return F.copy_to(arr1, F.context(arr2))
def _get_frame(frame, names, ids, ctx):
col_dict = {name: F.copy_to(frame[name][_copy_to_like(ids, frame[name])], \
ctx) for name in names}
col_dict = {}
for name in names:
col = frame[name][_copy_to_like(ids, frame[name])]
if ctx:
col = F.copy_to(col, ctx)
col_dict[name] = col
if len(col_dict) == 0:
return FrameRef(Frame(num_rows=len(ids)))
else:
return FrameRef(Frame(col_dict))
def _copy_frame(frame, ctx):
return {name: F.copy_to(frame[name], ctx) for name in frame}
new_frame = {}
for name in frame:
new_frame[name] = F.copy_to(frame[name], ctx) if ctx else frame[name]
return new_frame
def _update_frame(frame, names, ids, new_frame):
......
......@@ -3,8 +3,6 @@
from __future__ import absolute_import
from abc import abstractmethod
import functools
import operator
from ... import backend as F
from ...frame import FrameRef, Frame
......@@ -19,8 +17,6 @@ __all__ = [
'OpCode', 'Executor',
'NodeUDFExecutor', 'NODE_UDF',
'EdgeUDFExecutor', 'EDGE_UDF',
'SPMVExecutor', 'SPMV',
'SPMVWithDataExecutor', 'SPMV_WITH_DATA',
'ReadExecutor', 'READ',
'ReadColExecutor', 'READ_COL',
'ReadRowExecutor', 'READ_ROW',
......@@ -34,15 +30,16 @@ __all__ = [
'AppendRow_Executor', 'APPEND_ROW_',
'WriteRowInplace_Executor', 'WRITE_ROW_INPLACE_',
'ClearFrame_Executor', 'CLEAR_FRAME_',
'BinaryReduceExecutor', 'BINARY_REDUCE',
'CopyReduceExecutor', 'COPY_REDUCE',
]
class OpCode(object):
"""Opcode for all the executor types."""
# immutable op
NODE_UDF = 0
EDGE_UDF = 1
SPMV = 2
SPMV_WITH_DATA = 3
READ = 4
READ_COL = 5
READ_ROW = 6
......@@ -58,6 +55,10 @@ class OpCode(object):
APPEND_ROW_ = 25
WRITE_ROW_INPLACE_ = 26
CLEAR_FRAME_ = 27
# DGL kernels
BINARY_REDUCE = 50
COPY_REDUCE = 51
class Executor(object):
"""Base executor class.
......@@ -422,181 +423,6 @@ def READ_ROW(fd, row, ret=None):
get_current_prog().issue(reg['executor_cls'](fd, row, ret))
return ret
class SPMVExecutor(Executor):
"""Executor for sparse-matrix-dense-matrix multiply.
Parameters
----------
spA : var.Var
Variable for sparse matrix lambda. The lambda returns the sparse matrix
given a context object.
B : var.Var
Variable for the dense feature tensor.
ret : var.Var
Variable for the result.
"""
def __init__(self, spA, B, ret):
self.spA = spA
self.B = B
self.ret = ret
def opcode(self):
return OpCode.SPMV
def arg_vars(self):
return [self.spA, self.B]
def ret_var(self):
return self.ret
def run(self):
spA_ctx_fn = self.spA.data
B = self.B.data
ctx = F.context(B)
spA = spA_ctx_fn(ctx)
if F.ndim(B) == 1:
# B is a vector, append a (1,) dim at the end
B = F.unsqueeze(B, 1)
C = F.spmm(spA, B)
C = F.squeeze(C, 1)
elif F.ndim(B) > 2:
# Flatten the dim 1:~
B_shape = F.shape(B)
feat_shape = B_shape[1:]
tmp_B_shape = (B_shape[0], functools.reduce(operator.mul, feat_shape, 1))
B = F.reshape(B, tmp_B_shape)
C = F.spmm(spA, B)
C_shape = (F.shape(C)[0],) + feat_shape
C = F.reshape(C, C_shape)
else:
C = F.spmm(spA, B)
self.ret.data = C
IR_REGISTRY[OpCode.SPMV] = {
'name' : 'SPMV',
'args_type' : [VarType.SPMAT, VarType.FEAT],
'ret_type' : VarType.FEAT,
'executor_cls' : SPMVExecutor,
}
def SPMV(spA, B, ret=None):
"""Perform sparse-matrix-dense-matrix multiply symbolically.
Parameters
----------
spA : var.Var
Variable for sparse matrix lambda. The lambda returns the sparse matrix
given a context object.
B : var.Var
Variable for the dense feature tensor.
ret : var.Var, optional
Variable for the result. If not give, a new variable will be created.
Returns
-------
var.Var
Variable for the result.
"""
reg = IR_REGISTRY[OpCode.SPMV]
ret = var.new(reg['ret_type']) if ret is None else ret
get_current_prog().issue(reg['executor_cls'](spA, B, ret))
return ret
class SPMVWithDataExecutor(Executor):
"""Executor for sparse-matrix-dense-matrix multiply with provided sparse data.
Parameters
----------
spA : var.Var
Variable for sparse matrix lambda. The lambda returns the sparse matrix
given a context object.
A_data : var.Var
Variable for the sparse matrix data.
B : var.Var
Variable for the dense feature tensor.
ret : var.Var
Variable for the result.
"""
def __init__(self, spA, A_data, B, ret):
self.spA = spA
self.A_data = A_data
self.B = B
self.ret = ret
def opcode(self):
return OpCode.SPMV_WITH_DATA
def arg_vars(self):
return [self.spA, self.A_data, self.B]
def ret_var(self):
return self.ret
def run(self):
spA_ctx_fn = self.spA.data
A_data = self.A_data.data
if F.ndim(A_data) > 1:
# A_data is of shape (E, 1). Squeeze the last dim.
A_data = F.squeeze(A_data, 1)
B = self.B.data
ctx = F.context(B)
spA = spA_ctx_fn(ctx)
spidx = F.sparse_matrix_indices(spA)
shape = F.shape(spA)
# shuffle index is not used
spA, _ = F.sparse_matrix(A_data, spidx, shape)
if F.ndim(B) == 1:
# B is a vector, append a (1,) dim at the end
B = F.unsqueeze(B, 1)
C = F.spmm(spA, B)
C = F.squeeze(C, 1)
elif F.ndim(B) > 2:
# Flatten the dim 1:~
B_shape = F.shape(B)
feat_shape = B_shape[1:]
tmp_B_shape = (B_shape[0], functools.reduce(operator.mul, feat_shape, 1))
B = F.reshape(B, tmp_B_shape)
C = F.spmm(spA, B)
C_shape = (F.shape(C)[0],) + feat_shape
C = F.reshape(C, C_shape)
else:
C = F.spmm(spA, B)
self.ret.data = C
IR_REGISTRY[OpCode.SPMV_WITH_DATA] = {
'name' : 'SPMV_WITH_DATA',
'args_type' : [VarType.SPMAT, VarType.FEAT, VarType.FEAT],
'ret_type' : VarType.FEAT,
'executor_cls' : SPMVWithDataExecutor,
}
def SPMV_WITH_DATA(spA, A_data, B, ret=None):
"""Perform sparse-matrix-dense-matrix multiply with sparse data symbolically.
Parameters
----------
spA : var.Var
Variable for sparse matrix lambda. The lambda returns the sparse matrix
given a context object.
A_data : var.Var
Variable for the sparse matrix data.
B : var.Var
Variable for the dense feature tensor.
ret : var.Var, optional
Variable for the result. If not give, a new variable will be created.
Returns
-------
var.Var
Variable for the result.
"""
reg = IR_REGISTRY[OpCode.SPMV_WITH_DATA]
ret = var.new(reg['ret_type']) if ret is None else ret
get_current_prog().issue(reg['executor_cls'](spA, A_data, B, ret))
return ret
class MergeRowExecutor(Executor):
"""Executor for merge row data according to the given order.
......@@ -1169,3 +995,254 @@ def CLEAR_FRAME_(fd):
"""
reg = IR_REGISTRY[OpCode.CLEAR_FRAME_]
get_current_prog().issue(reg['executor_cls'](fd))
class BinaryReduceExecutor(Executor):
"""Executor for BINARY_REDUCE
Parameters
----------
reducer : str
String representing reduction to perform, can be "sum", "max", "min",
"mean", "prod", "none" (no reduction)
binary_op : str
String representing binary operation to perform, can be "add", "mul",
"sub", "div", "dot"
graph : var.Var
Variable for graph index lambda. The lambda returns the immutable graph
index given a context object.
lhs: int
The lhs target (src, dst, edge)
rhs: int
The rhs target (src, dst, edge)
lhs_data : var.Var
Variable for the lhs data
rhs_data : var.Var
Variable for the rhs data
out_size : int
Output size
lhs_map : var.Var
Variable for mapping lambda. The lambda returns the lhs id mapping
array on given context
rhs_map : var.Var
Variable for mapping lambda. The lambda returns the rhs id mapping
array on given context
out_map : var.Var
Variable for mapping lambda. The lambda returns the output id mapping
array on given context
ret : var.Var
Variable for the result.
"""
def __init__(self, reducer, binary_op, graph, lhs, rhs, lhs_data,
rhs_data, out_size, lhs_map, rhs_map, out_map, ret):
self.reducer = reducer
self.binary_op = binary_op
self.graph = graph
self.lhs = lhs
self.rhs = rhs
self.lhs_data = lhs_data
self.rhs_data = rhs_data
self.out_size = out_size
self.lhs_map = lhs_map
self.rhs_map = rhs_map
self.out_map = out_map
self.ret = ret
def opcode(self):
return OpCode.BINARY_REDUCE
def arg_vars(self):
return [self.reducer, self.binary_op, self.graph, self.lhs, self.rhs,
self.lhs_data, self.rhs_data, self.out_size, self.lhs_map,
self.rhs_map, self.out_map]
def ret_var(self):
return self.ret
def run(self):
lhs_data = self.lhs_data.data
rhs_data = self.rhs_data.data
ctx = utils.to_dgl_context(F.context(lhs_data))
graph = self.graph.data(ctx)
lhs_map = self.lhs_map.data(ctx) if self.lhs_map.data else None
rhs_map = self.rhs_map.data(ctx) if self.rhs_map.data else None
out_map = self.out_map.data(ctx) if self.out_map.data else None
if not isinstance(lhs_map, tuple):
lhs_map = (lhs_map, lhs_map)
if not isinstance(rhs_map, tuple):
rhs_map = (rhs_map, rhs_map)
if not isinstance(out_map, tuple):
out_map = (out_map, out_map)
self.ret.data = F.binary_reduce(
self.reducer, self.binary_op, graph, self.lhs, self.rhs,
lhs_data, rhs_data, self.out_size, lhs_map, rhs_map, out_map)
IR_REGISTRY[OpCode.BINARY_REDUCE] = {
'name': 'BINARY_REDUCE',
'args_type': [VarType.STR, VarType.STR, VarType.GRAPH, VarType.INT,
VarType.INT, VarType.FEAT, VarType.FEAT, VarType.INT,
VarType.MAP, VarType.MAP, VarType.MAP],
'ret_type': VarType.FEAT,
'executor_cls': BinaryReduceExecutor,
}
def BINARY_REDUCE(reducer, binary_op, graph, lhs, rhs, lhs_data, rhs_data,
out_size, lhs_map, rhs_map, out_map, ret=None):
"""Perform BINARY_REDUCE symbolically.
Parameters
----------
reducer : str
String representing reduction to perform, can be "sum", "max", "min",
"mean", "prod", "none" (no reduction)
binary_op : str
String representing binary operation to perform, can be "add", "mul",
"sub", "div", "dot"
graph : var.Var
Variable for graph index lambda. The lambda returns the immutable graph
index given a context object.
lhs: int
The lhs target (src, dst, edge)
rhs: int
The rhs target (src, dst, edge)
lhs_data : var.Var
Variable for the lhs data
rhs_data : var.Var
Variable for the rhs data
out_size : int
Output size
lhs_map : var.Var
Variable for mapping lambda. The lambda returns the lhs id mapping
array on given context
rhs_map : var.Var
Variable for mapping lambda. The lambda returns the rhs id mapping
array on given context
out_map : var.Var
Variable for mapping lambda. The lambda returns the output id mapping
array on given context
ret : var.Var, optional
Variable for the result. If not give, a new variable will be created.
Returns
-------
var.Var
Variable for the result.
"""
reg = IR_REGISTRY[OpCode.BINARY_REDUCE]
ret = var.new(reg['ret_type']) if ret is None else ret
get_current_prog().issue(reg['executor_cls'](
reducer, binary_op, graph, lhs, rhs, lhs_data, rhs_data, out_size,
lhs_map, rhs_map, out_map, ret))
return ret
class CopyReduceExecutor(Executor):
"""Executor for COPY_REDUCE
Parameters
----------
reducer : str
String representing reduction to perform, can be "sum", "max", "min",
"mean", "prod", "none" (no reduction)
graph : var.Var
Variable for graph index lambda. The lambda returns the immutable graph
index given a context object.
target: int
The input target (src, dst, edge)
in_data : var.Var
Variable for the input data
out_size : int
Output size
in_map : var.Var
Variable for mapping lambda. The lambda returns the input id mapping
array on given context
out_map : var.Var
Variable for mapping lambda. The lambda returns the output id mapping
array on given context
ret : var.Var
Variable for the result.
"""
def __init__(self, reducer, graph, target, in_data, out_size, in_map,
out_map, ret):
self.reducer = reducer
self.graph = graph
self.target = target
self.in_data = in_data
self.out_size = out_size
self.in_map = in_map
self.out_map = out_map
self.ret = ret
def opcode(self):
return OpCode.COPY_REDUCE
def arg_vars(self):
return [self.reducer, self.graph, self.target, self.in_data,
self.out_size, self.in_map, self.out_map]
def ret_var(self):
return self.ret
def run(self):
in_data = self.in_data.data
ctx = utils.to_dgl_context(F.context(in_data))
graph = self.graph.data(ctx)
in_map = self.in_map.data(ctx) if self.in_map.data else None
out_map = self.out_map.data(ctx) if self.out_map.data else None
if not isinstance(in_map, tuple):
in_map = (in_map, in_map)
if not isinstance(out_map, tuple):
out_map = (out_map, out_map)
self.ret.data = F.copy_reduce(
self.reducer, graph, self.target, in_data, self.out_size, in_map,
out_map)
IR_REGISTRY[OpCode.COPY_REDUCE] = {
'name': 'COPY_REDUCE',
'args_type': [VarType.STR, VarType.GRAPH, VarType.INT, VarType.FEAT, VarType.INT,
VarType.MAP, VarType.MAP],
'ret_type': VarType.FEAT,
'executor_cls': CopyReduceExecutor,
}
def COPY_REDUCE(reducer, graph, target, in_data, out_size, in_map, out_map,
ret=None):
"""Perform COPY_REDUCE symbolically.
Parameters
----------
reducer : str
String representing reduction to perform, can be "sum", "max", "min",
"mean", "prod", "none" (no reduction)
graph : var.Var
Variable for graph index lambda. The lambda returns the immutable graph
index given a context object.
target: int
The input target (src, dst, edge)
in_data : var.Var
Variable for the input data
out_size : int
Output size
in_map : var.Var
Variable for mapping lambda. The lambda returns the input id mapping
array on given context
out_map : var.Var
Variable for mapping lambda. The lambda returns the output id mapping
array on given context
ret : var.Var, optional
Variable for the result. If not give, a new variable will be created.
Returns
-------
var.Var
Variable for the result.
"""
reg = IR_REGISTRY[OpCode.COPY_REDUCE]
ret = var.new(reg['ret_type']) if ret is None else ret
get_current_prog().issue(reg['executor_cls'](
reducer, graph, target, in_data, out_size, in_map, out_map, ret))
return ret
......@@ -11,26 +11,30 @@ class VarType(object):
FEAT = 0
FEAT_DICT = 1
# Types for concrete objects (i.e, they must have values).
SPMAT = 2
GRAPH = 2
IDX = 3
STR = 4
FUNC = 5
MAP = 6
INT = 7
VAR_TYPE_NAME_MAP = [
'Feat',
'FeatDict',
'SpMat',
'GRAPH',
'Idx',
'Str',
'Func',
'Map',
'Int',
]
class Var(object):
"""Class for variables in IR.
Variables represent data in the IR. A variable can contain concrete values.
Otherwise, it can act as a "symbol", whose values are not materialized at the
moment, but later.
Otherwise, it can act as a "symbol", whose values are not materialized at
the moment, but later.
Parameters
----------
......@@ -42,6 +46,7 @@ class Var(object):
The data.
"""
__slots__ = ['name', 'typecode', 'data']
def __init__(self, name, typecode, data):
self.name = name
self.typecode = typecode
......@@ -73,9 +78,9 @@ def FEAT_DICT(data=None, name=None):
"""Create a variable for feature dict."""
return new(VarType.FEAT_DICT, data, name)
def SPMAT(data=None, name=None):
"""Create a variable for sparse matrix lambda."""
return new(VarType.SPMAT, data, name)
def GRAPH(data=None, name=None):
"""Create a variable for graph index lambda."""
return new(VarType.GRAPH, data, name)
def IDX(data=None, name=None):
"""Create a variable for index."""
......@@ -88,3 +93,11 @@ def STR(data=None, name=None):
def FUNC(data=None, name=None):
"""Create a variable for function."""
return new(VarType.FUNC, data, name)
def MAP(data=None, name=None):
"""Create a variable for mapping lambda"""
return new(VarType.MAP, data, name)
def INT(data=None, name=None):
"""Create a variable for int value"""
return new(VarType.INT, data, name)
"""DGL mini-runtime."""
class Runtime(object):
"""The mini runtime class."""
@staticmethod
def run(prog):
"""Run the given program."""
for exe in prog.execs:
#prog.pprint_exe(exe)
# prog.pprint_exe(exe)
exe.run()
......@@ -6,7 +6,7 @@ from .._ffi.function import _init_api
from ..base import DGLError
from .. import backend as F
from ..frame import frame_like, FrameRef
from ..function.base import BuiltinFunction, BundledFunction
from ..function.base import BuiltinFunction
from ..udf import EdgeBatch, NodeBatch
from . import ir
......@@ -14,6 +14,8 @@ from .ir import var
from . import degree_bucketing as db
from . import spmv
from .. import ndarray as nd
__all__ = [
"schedule_send",
"schedule_recv",
......@@ -42,19 +44,22 @@ def schedule_send(graph, u, v, eid, message_func):
message_func: callable or list of callable
The message function
"""
message_func = _standardize_func_usage(message_func, 'message')
mfunc_is_list = utils.is_iterable(message_func)
if mfunc_is_list:
message_func = BundledFunction(message_func)
# vars
var_mf = var.FEAT_DICT(graph._msg_frame)
var_nf = var.FEAT_DICT(graph._node_frame)
var_ef = var.FEAT_DICT(graph._edge_frame)
var_mf = var.FEAT_DICT(graph._msg_frame)
var_u = var.IDX(u)
var_v = var.IDX(v)
var_eid = var.IDX(eid)
msg = _gen_send(graph, var_nf, var_nf, var_ef, var_u, var_v, var_eid, message_func)
ir.WRITE_ROW_(var_mf, var_eid, msg)
var_msg = _gen_send(graph=graph,
u=u,
v=v,
eid=eid,
mfunc=message_func,
var_src_nf=var_nf,
var_dst_nf=var_nf,
var_ef=var_ef)
# write tmp msg back
ir.WRITE_ROW_(var_mf, var_eid, var_msg)
# set message indicator to 1
graph._set_msg_index(graph._get_msg_index().set_items(eid, 1))
......@@ -119,11 +124,15 @@ def schedule_snr(graph,
inplace):
"""Schedule send_and_recv.
Currently it builds a subgraph from edge_tuples with the same number of
nodes as the original graph, so that routines for whole-graph updates
(e.g. fused kernels) could be reused.
Parameters
----------
graph: DGLGraph
The DGLGraph to use
edge_tuple: tuple
edge_tuples: tuple
A tuple of (src ids, dst ids, edge ids) representing edges to perform
send_and_recv
message_func: callable or list of callable
......@@ -146,14 +155,23 @@ def schedule_snr(graph,
var_recv_nodes = var.IDX(recv_nodes, name='recv_nodes')
# generate send and reduce schedule
uv_getter = lambda: (var_u, var_v)
adj_creator = lambda: spmv.build_adj_matrix_uv((u, v), recv_nodes, graph.number_of_nodes())
inc_creator = lambda: spmv.build_inc_matrix_dst(v, recv_nodes)
reduced_feat = _gen_send_reduce(graph, graph._node_frame, graph._node_frame,
graph._edge_frame, message_func, reduce_func,
var_eid, var_recv_nodes,
uv_getter, adj_creator, inc_creator)
adj_creator = lambda: spmv.build_gidx_and_mapping_uv(
edge_tuples, graph.number_of_nodes())
out_map_creator = lambda nbits: _build_idx_map(recv_nodes, nbits)
reduced_feat = _gen_send_reduce(graph=graph,
src_node_frame=graph._node_frame,
dst_node_frame=graph._node_frame,
edge_frame=graph._edge_frame,
message_func=message_func,
reduce_func=reduce_func,
var_send_edges=var_eid,
var_reduce_nodes=var_recv_nodes,
uv_getter=uv_getter,
adj_creator=adj_creator,
out_map_creator=out_map_creator)
# generate apply schedule
final_feat = _apply_with_accum(graph, var_recv_nodes, var_nf, reduced_feat, apply_func)
final_feat = _apply_with_accum(graph, var_recv_nodes, var_nf, reduced_feat,
apply_func)
if inplace:
ir.WRITE_ROW_INPLACE_(var_nf, var_recv_nodes, final_feat)
else:
......@@ -182,9 +200,8 @@ def schedule_update_all(graph,
nodes = utils.toindex(slice(0, graph.number_of_nodes()))
schedule_apply_nodes(graph, nodes, apply_func, inplace=False)
else:
# TODO is the eid here correct?
eid = utils.toindex(slice(0, graph.number_of_edges())) # shortcut for ALL
recv_nodes = utils.toindex(slice(0, graph.number_of_nodes())) # shortcut for ALL
eid = utils.toindex(slice(0, graph.number_of_edges())) # ALL
recv_nodes = utils.toindex(slice(0, graph.number_of_nodes())) # ALL
# create vars
var_nf = var.FEAT_DICT(graph._node_frame, name='nf')
var_recv_nodes = var.IDX(recv_nodes, name='recv_nodes')
......@@ -193,14 +210,22 @@ def schedule_update_all(graph,
def uv_getter():
src, dst, _ = graph._graph.edges('eid')
return var.IDX(src), var.IDX(dst)
adj_creator = lambda: spmv.build_adj_matrix_graph(graph)
inc_creator = lambda: spmv.build_inc_matrix_graph(graph)
reduced_feat = _gen_send_reduce(graph, graph._node_frame, graph._node_frame,
graph._edge_frame, message_func, reduce_func,
var_eid, var_recv_nodes,
uv_getter, adj_creator, inc_creator)
adj_creator = lambda: spmv.build_gidx_and_mapping_graph(graph)
out_map_creator = lambda nbits: None
reduced_feat = _gen_send_reduce(graph=graph,
src_node_frame=graph._node_frame,
dst_node_frame=graph._node_frame,
edge_frame=graph._edge_frame,
message_func=message_func,
reduce_func=reduce_func,
var_send_edges=var_eid,
var_reduce_nodes=var_recv_nodes,
uv_getter=uv_getter,
adj_creator=adj_creator,
out_map_creator=out_map_creator)
# generate optional apply
final_feat = _apply_with_accum(graph, var_recv_nodes, var_nf, reduced_feat, apply_func)
final_feat = _apply_with_accum(graph, var_recv_nodes, var_nf,
reduced_feat, apply_func)
ir.WRITE_DICT_(var_nf, final_feat)
def schedule_apply_nodes(graph,
......@@ -224,8 +249,8 @@ def schedule_apply_nodes(graph,
-------
A list of executors for DGL Runtime
"""
var_nf = var.FEAT_DICT(graph._node_frame, name='nf')
var_v = var.IDX(v)
var_nf = var.FEAT_DICT(graph._node_frame, name='nf')
v_nf = ir.READ_ROW(var_nf, var_v)
def _afunc_wrapper(node_data):
nbatch = NodeBatch(graph, v, node_data)
......@@ -301,24 +326,17 @@ def schedule_apply_edges(graph,
A list of executors for DGL Runtime
"""
# vars
var_nf = var.FEAT_DICT(graph._node_frame, name='nf')
var_nf = var.FEAT_DICT(graph._node_frame)
var_ef = var.FEAT_DICT(graph._edge_frame)
var_out = _gen_send(graph=graph, u=u, v=v, eid=eid, mfunc=apply_func,
var_src_nf=var_nf, var_dst_nf=var_nf, var_ef=var_ef)
var_ef = var.FEAT_DICT(graph._edge_frame, name='ef')
var_u = var.IDX(u)
var_v = var.IDX(v)
var_eid = var.IDX(eid)
# schedule apply edges
fdsrc = ir.READ_ROW(var_nf, var_u)
fddst = ir.READ_ROW(var_nf, var_v)
fdedge = ir.READ_ROW(var_ef, var_eid)
def _efunc_wrapper(src_data, edge_data, dst_data):
ebatch = EdgeBatch(graph, (u, v, eid), src_data, edge_data, dst_data)
return apply_func(ebatch)
_efunc = var.FUNC(_efunc_wrapper)
new_fdedge = ir.EDGE_UDF(_efunc, fdsrc, fdedge, fddst)
if inplace:
ir.WRITE_ROW_INPLACE_(var_ef, var_eid, new_fdedge)
ir.WRITE_ROW_INPLACE_(var_ef, var_eid, var_out)
else:
ir.WRITE_ROW_(var_ef, var_eid, new_fdedge)
ir.WRITE_ROW_(var_ef, var_eid, var_out)
def schedule_nodeflow_apply_edges(graph, block_id,
u, v, eid,
......@@ -349,25 +367,16 @@ def schedule_nodeflow_apply_edges(graph, block_id,
"""
# vars
in_var_nf = var.FEAT_DICT(graph._get_node_frame(block_id), name='in_nf')
out_var_nf = var.FEAT_DICT(graph._get_node_frame(block_id + 1), name='out_nf')
out_var_nf = var.FEAT_DICT(graph._get_node_frame(block_id + 1),
name='out_nf')
var_ef = var.FEAT_DICT(graph._get_edge_frame(block_id), name='ef')
var_u = var.IDX(u)
var_v = var.IDX(v)
var_out = _gen_send(graph, u, v, eid, apply_func, in_var_nf, out_var_nf,
var_ef)
var_eid = var.IDX(eid)
# schedule apply edges
fdsrc = ir.READ_ROW(in_var_nf, var_u)
fddst = ir.READ_ROW(out_var_nf, var_v)
fdedge = ir.READ_ROW(var_ef, var_eid)
def _efunc_wrapper(src_data, edge_data, dst_data):
ebatch = EdgeBatch(graph, (u, v, eid), src_data, edge_data, dst_data)
return apply_func(ebatch)
_efunc = var.FUNC(_efunc_wrapper)
new_fdedge = ir.EDGE_UDF(_efunc, fdsrc, fdedge, fddst)
# TODO we need to avoid index_copy here.
if inplace:
ir.WRITE_ROW_INPLACE_(var_ef, var_eid, new_fdedge)
ir.WRITE_ROW_INPLACE_(var_ef, var_eid, var_out)
else:
ir.WRITE_ROW_(var_ef, var_eid, new_fdedge)
ir.WRITE_ROW_(var_ef, var_eid, var_out)
def schedule_push(graph,
u,
......@@ -441,12 +450,14 @@ def schedule_pull(graph,
var_eid = var.IDX(eid)
# generate send and reduce schedule
uv_getter = lambda: (var_u, var_v)
adj_creator = lambda: spmv.build_adj_matrix_uv((u, v), pull_nodes, graph.number_of_nodes())
inc_creator = lambda: spmv.build_inc_matrix_dst(v, pull_nodes)
reduced_feat = _gen_send_reduce(graph, graph._node_frame, graph._node_frame,
graph._edge_frame, message_func, reduce_func,
var_eid, var_pull_nodes,
uv_getter, adj_creator, inc_creator)
num_nodes = graph.number_of_nodes()
adj_creator = lambda: spmv.build_gidx_and_mapping_uv((u, v, eid), num_nodes)
out_map_creator = lambda nbits: _build_idx_map(pull_nodes, nbits)
reduced_feat = _gen_send_reduce(graph, graph._node_frame,
graph._node_frame, graph._edge_frame,
message_func, reduce_func, var_eid,
var_pull_nodes, uv_getter, adj_creator,
out_map_creator)
# generate optional apply
final_feat = _apply_with_accum(graph, var_pull_nodes, var_nf, reduced_feat, apply_func)
if inplace:
......@@ -486,7 +497,6 @@ def schedule_group_apply_edge(graph,
var_nf = var.FEAT_DICT(graph._node_frame, name='nf')
var_ef = var.FEAT_DICT(graph._edge_frame, name='ef')
var_out = var.FEAT_DICT(name='new_ef')
# TODO (lingfan): check if apply_func is a DGL builtin
db.gen_group_apply_edge_schedule(graph, apply_func, u, v, eid, group_by,
var_nf, var_ef, var_out)
var_eid = var.IDX(eid)
......@@ -518,25 +528,29 @@ def schedule_nodeflow_update_all(graph,
"""
# A NodeFlow shouldn't have 0 edges.
assert graph.block_size(block_id) > 0
eid = utils.toindex(slice(0, graph.block_size(block_id))) # shortcut for ALL
dest_nodes = utils.toindex(slice(0, graph.layer_size(block_id + 1))) # shortcut for ALL
eid = utils.toindex(slice(0, graph.block_size(block_id))) # ALL
dest_nodes = utils.toindex(slice(0, graph.layer_size(block_id + 1))) # ALL
# create vars
var_nf = var.FEAT_DICT(graph._get_node_frame(block_id + 1), name='out_nf')
var_dest_nodes = var.IDX(dest_nodes, name='dest_nodes')
var_eid = var.IDX(eid)
# generate send + reduce
def uv_getter():
# TODO get all edges in the block.
src, dst, _ = graph.block_edges(block_id)
src, dst, _ = graph.block_edges(block_id, remap=True)
return var.IDX(utils.toindex(src)), var.IDX(utils.toindex(dst))
adj_creator = lambda: spmv.build_block_adj_matrix_graph(graph, block_id)
inc_creator = lambda: spmv.build_block_inc_matrix_graph(graph, block_id)
reduced_feat = _gen_send_reduce(graph, graph._get_node_frame(block_id),
graph._get_node_frame(block_id + 1),
graph._get_edge_frame(block_id),
message_func, reduce_func,
var_eid, var_dest_nodes,
uv_getter, adj_creator, inc_creator)
adj_creator = lambda: spmv.build_gidx_and_mapping_block(graph, block_id)
out_map_creator = lambda nbits: None
reduced_feat = _gen_send_reduce(graph=graph,
src_node_frame=graph._get_node_frame(block_id),
dst_node_frame=graph._get_node_frame(block_id + 1),
edge_frame=graph._get_edge_frame(block_id),
message_func=message_func,
reduce_func=reduce_func,
var_send_edges=var_eid,
var_reduce_nodes=var_dest_nodes,
uv_getter=uv_getter,
adj_creator=adj_creator,
out_map_creator=out_map_creator)
# generate optional apply
final_feat = _apply_with_accum(graph, var_dest_nodes, var_nf, reduced_feat, apply_func)
ir.WRITE_DICT_(var_nf, final_feat)
......@@ -550,7 +564,7 @@ def schedule_nodeflow_compute(graph,
reduce_func,
apply_func,
inplace):
"""get flow compute schedule in NodeFlow
"""Get flow compute schedule in NodeFlow
Parameters
----------
......@@ -564,6 +578,8 @@ def schedule_nodeflow_compute(graph,
Destination nodes of edges to apply
eid : utils.Index
Ids of sending edges
dest_nodes : utils.Index
Destination nodes ids
message_func: callable or list of callable
The message function
reduce_func: callable or list of callable
......@@ -579,27 +595,36 @@ def schedule_nodeflow_compute(graph,
if len(eid) == 0:
# All the nodes are 0deg; downgrades to apply.
if apply_func is not None:
schedule_nodeflow_apply_nodes(graph, block_id + 1, dest_nodes, apply_func, inplace)
schedule_nodeflow_apply_nodes(graph, block_id + 1, dest_nodes,
apply_func, inplace)
else:
# create vars
var_nf = var.FEAT_DICT(graph._get_node_frame(block_id + 1), name='out_nf')
var_dest_nodes = var.IDX(dest_nodes, name='dest_nodes')
var_nf = var.FEAT_DICT(graph._get_node_frame(block_id + 1),
name='out_nf')
var_u = var.IDX(u)
var_v = var.IDX(v)
var_eid = var.IDX(eid)
var_dest_nodes = var.IDX(dest_nodes, name='dest_nodes')
# generate send and reduce schedule
uv_getter = lambda: (var_u, var_v)
adj_creator = lambda: spmv.build_adj_matrix_uv((u, v), dest_nodes,
graph.layer_size(block_id))
inc_creator = lambda: spmv.build_inc_matrix_dst(v, dest_nodes)
reduced_feat = _gen_send_reduce(graph, graph._get_node_frame(block_id),
graph._get_node_frame(block_id + 1),
graph._get_edge_frame(block_id),
message_func, reduce_func,
var_eid, var_dest_nodes,
uv_getter, adj_creator, inc_creator)
adj_creator = lambda: spmv.build_gidx_and_mapping_block(
graph, block_id, (u, v, eid))
out_map_creator = lambda nbits: _build_idx_map(utils.toindex(dest_nodes), nbits)
reduced_feat = _gen_send_reduce(graph=graph,
src_node_frame=graph._get_node_frame(block_id),
dst_node_frame=graph._get_node_frame(block_id + 1),
edge_frame=graph._get_edge_frame(block_id),
message_func=message_func,
reduce_func=reduce_func,
var_send_edges=var_eid,
var_reduce_nodes=var_dest_nodes,
uv_getter=uv_getter,
adj_creator=adj_creator,
out_map_creator=out_map_creator)
# generate optional apply
final_feat = _apply_with_accum(graph, var_dest_nodes, var_nf, reduced_feat, apply_func)
final_feat = _apply_with_accum(graph, var_dest_nodes, var_nf,
reduced_feat, apply_func)
if inplace:
ir.WRITE_ROW_INPLACE_(var_nf, var_dest_nodes, final_feat)
else:
......@@ -619,8 +644,8 @@ def _standardize_func_usage(func, func_name):
2. a dgl builtin function
3. a list of dgl builtin function
This function checks if func meets the requirement, and merges last two cases
by putting builtin function in case 2 into a list
This function checks if func meets the requirement, and merges last two
cases by putting builtin function in case 2 into a list
Returns:
One single UDF function or a list of builtin function
......@@ -660,6 +685,7 @@ def _apply_with_accum(graph, var_nodes, var_nf, var_accum, apply_func):
# features and "merge" it with the reduced features.
v_nf = ir.READ_ROW(var_nf, var_nodes)
v_nf = ir.UPDATE_DICT(v_nf, var_accum)
def _afunc_wrapper(node_data):
nbatch = NodeBatch(graph, var_nodes.data, node_data)
return apply_func(nbatch)
......@@ -671,13 +697,21 @@ def _apply_with_accum(graph, var_nodes, var_nf, var_accum, apply_func):
return final_feat
def _gen_reduce(graph, reduce_func, edge_tuples, recv_nodes):
"""
"""Generate reduce schedule
Parameters
----------
graph : DGLGraph
reduce_func : callable
edge_tuples : tuple of utils.Index
recv_nodes : utils.Index
Returns
-------
var.FEAT_DICT
The reduced feature dict.
"""
_, dst, eid = edge_tuples
src, dst, eid = edge_tuples
rfunc = _standardize_func_usage(reduce_func, 'reduce')
rfunc_is_list = utils.is_iterable(rfunc)
# Create a tmp frame to hold the feature data.
......@@ -693,24 +727,25 @@ def _gen_reduce(graph, reduce_func, edge_tuples, recv_nodes):
var_out = var.FEAT_DICT(data=tmpframe)
if rfunc_is_list:
# UDF message + builtin reducer
# analyze e2v spmv
spmv_rfunc, rfunc = spmv.analyze_e2v_spmv(graph, rfunc)
inc = spmv.build_inc_matrix_eid(graph._msg_frame.num_rows, eid, dst,
recv_nodes)
spmv.gen_e2v_spmv_schedule(inc, spmv_rfunc, var_msg, var_out)
if len(rfunc) == 0:
# All mfunc and rfunc has been processed.
return var_out
# convert the remaining rfunc to UDFs
rfunc = BundledFunction(rfunc)
# gen degree bucketing schedule for UDF recv
db.gen_degree_bucketing_schedule(graph, rfunc, eid, dst,
recv_nodes, var_nf, var_msg, var_out)
return var_out
num_nodes = graph.number_of_nodes()
adj, edge_map, nbits = spmv.build_gidx_and_mapping_uv(
(src, dst, eid), num_nodes)
# using edge map instead of message map because messages are in global
# message frame
var_out_map = _build_idx_map(recv_nodes, nbits)
spmv.gen_e2v_spmv_schedule(graph=adj,
rfunc=rfunc,
message_frame=var_msg,
out=var_out,
out_size=len(recv_nodes),
edge_map=edge_map,
out_map=var_out_map)
return var_out
else:
# gen degree bucketing schedule for UDF recv
db.gen_degree_bucketing_schedule(graph, rfunc, eid, dst, recv_nodes,
var_nf, var_msg, var_out)
return var_out
def _gen_send_reduce(
graph,
......@@ -723,11 +758,24 @@ def _gen_send_reduce(
var_reduce_nodes,
uv_getter,
adj_creator,
inc_creator):
out_map_creator):
"""Generate send and reduce schedule.
This guarantees that the returned reduced features are batched
in the *unique-ascending* order of the edge destination node ids.
The function generates symbolic program for computing
(1) message function on the given edges (var_send_edges).
(2) reduce function on the given nodes (var_reduce_nodes).
If both message_func and reduce_func are DGL builtin functions, the schedule
will invoke fused message passing kernels (e.g. dgl.backend.binary_reduce) to
avoid generating explicit edge messages.
If message_func is UDF while reduce_func is DGL builtin function, the schedule
first invokes UDF to generate explicit edge messages, and then invokes
dgl.backend.copy_reduce to reduce messages on the destination nodes.
If both message_func and reduce_func are UDFs, the schedule first invokes message
UDF to generate explicit edge messages and then use degree-bucketing to invoke
reduce UDF.
Parameters
----------
......@@ -746,26 +794,36 @@ def _gen_send_reduce(
var_send_edges : var.IDX
The edges (ids) to perform send.
var_reduce_nodes : var.IDX
The nodes to perform reduce. This should include unique(v) + 0deg nodes.
Unique and sorted nodes to perform reduce. This should include
unique(v) + 0deg nodes.
uv_getter : callable
A function that returns a pair of var.IDX (u, v) for the triggered edges.
Function that returns a pair of var.IDX (u, v) for the triggered edges.
adj_creator : callable
A function that returns the adjmat and the shuffle index.
inc_creator : callable
A function that returns the incmat and the shuffle index.
Function that returns the adjmat, edge order of csr matrix, and
bit-width.
out_map_creator: callable
A function that returns a mapping from reduce_nodes to relabeled
consecutive ids
Returns
-------
var.FEAT_DICT
The reduced feature dict.
Notes
-----
Reduce_nodes are assumed to be in the *unique-ascending* order of the edge
destination node ids. The returned reduced features will be batched
following the order of reduce_nodes.
"""
# NOTE: currently, this function requires all var.IDX to contain concrete data.
# NOTE: currently, this function requires all var.IDX to contain concrete
# data.
reduce_nodes = var_reduce_nodes.data
# arg vars
var_src_nf = var.FEAT_DICT(src_node_frame, name='nf')
var_dst_nf = var.FEAT_DICT(dst_node_frame, name='nf')
var_ef = var.FEAT_DICT(edge_frame, name='ef')
var_src_nf = var.FEAT_DICT(src_node_frame, name='src_frame')
var_dst_nf = var.FEAT_DICT(dst_node_frame, name='dst_frame')
var_ef = var.FEAT_DICT(edge_frame, name='edge_frame')
var_eid = var_send_edges
# format the input functions
......@@ -774,61 +832,86 @@ def _gen_send_reduce(
mfunc_is_list = utils.is_iterable(mfunc)
rfunc_is_list = utils.is_iterable(rfunc)
# Create a tmp frame to hold the feature data.
# The frame has the same size and schemes of the
# node frame.
# TODO(minjie): should replace this with an IR call to make the program stateless.
# Create a tmp frame to hold the feature data. The frame has the same size
# and schemes of the node frame.
# TODO(minjie): should replace this with an IR call to make the program
# stateless.
tmpframe = FrameRef(frame_like(dst_node_frame._frame, len(reduce_nodes)))
var_out = var.FEAT_DICT(data=tmpframe)
# 1. If either mfunc or rfunc is builtin, generate adjmat, edge mapping and
# message mapping
if mfunc_is_list or rfunc_is_list:
adj, edge_map, nbits = adj_creator()
# 2. If rfunc is builtin, generate a mapping from recv nodes to consecutive
# output id
if rfunc_is_list:
out_map = out_map_creator(nbits)
# 3. First try fused message and reduce function
if mfunc_is_list and rfunc_is_list:
# builtin message + builtin reducer
# analyze v2v spmv
spmv_pairs, mfunc, rfunc = spmv.analyze_v2v_spmv(graph, mfunc, rfunc)
adj = adj_creator()
spmv.gen_v2v_spmv_schedule(adj, spmv_pairs, var_src_nf, var_ef, var_eid, var_out)
spmv.gen_v2v_spmv_schedule(graph=adj,
mfunc=mfunc,
rfunc=rfunc,
src_frame=var_src_nf,
dst_frame=var_dst_nf,
edge_frame=var_ef,
out=var_out,
out_size=len(reduce_nodes),
edge_map=edge_map,
out_map=out_map)
return var_out
if len(mfunc) == 0:
# All mfunc and rfunc have been converted to v2v spmv.
return var_out
var_u, var_v = uv_getter()
# 4. Unable to fuse, then generate message
if mfunc_is_list:
# Two cases:
# - mfunc is builtin while rfunc is UDF.
# - mfunc and rfunc are both builtin but some combinations
# fall through from the v2v spmv analysis.
# In both cases, convert the mfunc to UDF.
mfunc = BundledFunction(mfunc)
# generate UDF send schedule
var_u, var_v = uv_getter()
var_mf = _gen_send(graph, var_src_nf, var_dst_nf, var_ef, var_u, var_v, var_eid, mfunc)
# messages are builtin but reduce is UDF
# Create a tmp frame to hold the message.
# TODO: should replace this with an IR call to make the program
# stateless.
n_message = len(var_eid.data)
tmp_msg_frame = FrameRef(frame_like(edge_frame._frame, n_message))
var_mf = var.FEAT_DICT(data=tmp_msg_frame)
spmv.gen_v2e_spmv_schedule(graph=adj,
mfunc=mfunc,
src_frame=var_src_nf,
dst_frame=var_dst_nf,
edge_frame=var_ef,
out=var_mf,
out_size=n_message,
edge_map=edge_map)
else:
# generate UDF send schedule
var_mf = _gen_udf_send(graph, var_src_nf, var_dst_nf, var_ef, var_u,
var_v, var_eid, mfunc)
# 6. Generate reduce
if rfunc_is_list:
# UDF message + builtin reducer
# analyze e2v spmv
spmv_rfunc, rfunc = spmv.analyze_e2v_spmv(graph, rfunc)
inc = inc_creator()
spmv.gen_e2v_spmv_schedule(inc, spmv_rfunc, var_mf, var_out)
if len(rfunc) == 0:
# All mfunc and rfunc has been processed.
return var_out
# convert the remaining rfunc to UDFs
rfunc = BundledFunction(rfunc)
# gen degree bucketing schedule for UDF recv
mid = utils.toindex(slice(0, len(var_v.data))) # message id is from 0~|dst|
db.gen_degree_bucketing_schedule(
graph, rfunc, mid, var_v.data, reduce_nodes, var_dst_nf, var_mf, var_out)
return var_out
def _gen_send(graph, src_nfr, dst_nfr, efr, u, v, eid, mfunc):
"""Internal function to generate send schedule."""
fdsrc = ir.READ_ROW(src_nfr, u)
fddst = ir.READ_ROW(dst_nfr, v)
fdedge = ir.READ_ROW(efr, eid)
spmv.gen_e2v_spmv_schedule(graph=adj,
rfunc=rfunc,
message_frame=var_mf,
out=var_out,
out_size=len(reduce_nodes),
edge_map=None, # messages are stored compactly
out_map=out_map)
return var_out
else:
# gen degree bucketing schedule for UDF recv
mid = utils.toindex(slice(0, len(var_v.data)))
db.gen_degree_bucketing_schedule(graph, rfunc, mid, var_v.data,
reduce_nodes, var_dst_nf, var_mf,
var_out)
return var_out
def _gen_udf_send(graph, var_src_nf, var_dst_nf, var_ef, u, v, eid, mfunc):
"""Internal function to generate send schedule for UDF message function."""
fdsrc = ir.READ_ROW(var_src_nf, u)
fddst = ir.READ_ROW(var_dst_nf, v)
fdedge = ir.READ_ROW(var_ef, eid)
def _mfunc_wrapper(src_data, edge_data, dst_data):
ebatch = EdgeBatch(graph, (u.data, v.data, eid.data),
src_data, edge_data, dst_data)
......@@ -837,4 +920,75 @@ def _gen_send(graph, src_nfr, dst_nfr, efr, u, v, eid, mfunc):
msg = ir.EDGE_UDF(_mfunc_wrapper, fdsrc, fdedge, fddst)
return msg
def _gen_send(graph, u, v, eid, mfunc, var_src_nf, var_dst_nf, var_ef):
"""Internal function to generate send schedule"""
mfunc = _standardize_func_usage(mfunc, 'message')
mfunc_is_list = utils.is_iterable(mfunc)
# vars
var_u = var.IDX(u)
var_v = var.IDX(v)
var_eid = var.IDX(eid)
if mfunc_is_list:
if eid.is_slice(0, graph.number_of_edges()):
# full graph case
res = spmv.build_gidx_and_mapping_graph(graph)
else:
num_nodes = graph.number_of_nodes()
res = spmv.build_gidx_and_mapping_uv((u, v, eid), num_nodes)
adj, edge_map, _ = res
# create a tmp message frame
tmp_mfr = FrameRef(frame_like(graph._edge_frame._frame, len(eid)))
var_out = var.FEAT_DICT(data=tmp_mfr)
spmv.gen_v2e_spmv_schedule(graph=adj,
mfunc=mfunc,
src_frame=var_src_nf,
dst_frame=var_dst_nf,
edge_frame=var_ef,
out=var_out,
out_size=len(eid),
edge_map=edge_map)
else:
# UDF send
var_out = _gen_udf_send(graph, var_src_nf, var_dst_nf, var_ef, var_u,
var_v, var_eid, mfunc)
return var_out
def _build_idx_map(idx, nbits):
"""Build a map from the input ids to continuous ids that starts from zero.
And the number of bits data type of each integer in the mapping uses will
be nbits
Examples
--------
>>> x = [1, 5, 3, 6]
>>> o2n = map_to_continuous(x)
>>> o2n
[n/a, 0, n/a, 2, n/a, 1, 3]
"n/a" will be filled with 0
Parameters
----------
x : Index
The input ids, assumed to be unique.
nbits: int
Number of bits each integer in the mapping should use, can be 32 or 64
Returns
-------
old_to_new : CtxCachedObject
The mapping from old id to new id. It is a vector of length MAX(x).
One can use advanced indexing to convert an old id tensor to a
new id tensor: new_id = old_to_new[old_id]
"""
x = idx.tousertensor()
map_len = int(F.asnumpy(F.max(x, dim=0))) + 1
old_to_new = F.full_1d(map_len, -1, dtype=F.int64, ctx=F.cpu())
F.scatter_row_inplace(old_to_new, x, F.arange(0, len(x)))
old_to_new = utils.to_nbits_int(old_to_new, nbits)
old_to_new = F.zerocopy_to_dgl_ndarray(old_to_new)
return utils.CtxCachedObject(lambda ctx: nd.array(old_to_new, ctx=ctx))
_init_api("dgl.runtime.scheduler")
......@@ -4,159 +4,126 @@ from __future__ import absolute_import
from ..base import DGLError
from .. import backend as F
from .. import utils
from .. import ndarray as nd
from ..graph_index import from_coo
from . import ir
from .ir import var
def analyze_v2v_spmv(graph, mfunc, rfunc):
"""Analyze if SPMV from node space to node space can be applied.
def gen_v2v_spmv_schedule(graph, mfunc, rfunc, src_frame, dst_frame,
edge_frame, out, out_size, src_map=None,
dst_map=None, edge_map=None, out_map=None):
"""Generate v2v spmv schedule.
Parameters
----------
graph: DGLGraph
DGLGraph to use
mfunc : list of dgl.function.BuiltinFunction
The message function list.
rfunc : list of dgl.function.BuiltinFunction
The reduce function list.
Returns
-------
spmv_pairs : list of pair of builtin functions
The pair of spvm applicable message/reduce functions.
mfunc_left: list
A list of message functions that can't use v2v spmv. In other
words, these message functions need to be materialized
rfunc_left: list
A list of reduce functions that can't use v2v spmv
graph : utils.CtxCachedObject
Function that generates immutable graph index on given context
mfunc : list of builtin message func
Builtin message function list
rfunc : list of builtin reduce func
Builtin reduce function list
src_frame : var.Var
Input source node features
dst_frame : var.Var
Input destination node features
edge_frame : var.Var
Input edge features
out : var.Var
Output node features
out_size : int
Number of output nodes
src_map : utils.CtxCachedObject
Function that generates source node id mapping array on given context
dst_map : utils.CtxCachedObject
Function that generates destination node id mapping array on given
context
edge_map : utils.CtxCachedObject
Function that generates edge id mapping array on given context
out_map : utils.CtxCachedObject
Function that generates output id mapping array on given context
"""
spmv_pairs = []
mfunc_left = []
rfunc_left = []
fld2mfunc = {fn.out_field: fn for fn in mfunc}
touched_mfld = set()
for rfn in rfunc:
mfld = rfn.msg_field
if mfld not in fld2mfunc:
raise DGLError('Reduce function requires message field "%s",'
' but no message function generates it.' % mfld)
mfn = fld2mfunc[mfld]
# TODO(minjie): should pre-compile a look up table
if mfn.is_spmv_supported(graph) and rfn.is_spmv_supported():
spmv_pairs.append((mfn, rfn))
else:
if mfld not in touched_mfld:
touched_mfld.add(mfld)
mfunc_left.append(mfn)
rfunc_left.append(rfn)
return spmv_pairs, mfunc_left, rfunc_left
def analyze_e2v_spmv(graph, rfunc): # pylint: disable=unused-argument
"""Analyze if SPMV from edge space to node space can be applied.
Parameters
----------
graph: DGLGraph
DGLGraph to use
rfunc : list of dgl.function.BuiltinFunction
The reduce function list.
ftdst = mfn._invoke(graph, src_frame, dst_frame, edge_frame, out_size,
src_map, dst_map, edge_map, out_map,
reducer=rfn.name)
ir.WRITE_COL_(out, var.STR(rfn.out_field), ftdst)
Returns
-------
spmv_rfunc : list
A list of spmv-applicable reduce builtins.
rfunc_left : list
A list of reduce builtins that are not applicable
"""
spmv_rfunc = []
rfunc_left = []
for rfn in rfunc:
if rfn.is_spmv_supported():
spmv_rfunc.append(rfn)
else:
rfunc_left.append(rfn)
return spmv_rfunc, rfunc_left
def gen_v2v_spmv_schedule(adj, spmv_pairs, nft, eft, eid, out):
"""Generate v2v spmv schedule.
def gen_v2e_spmv_schedule(graph, mfunc, src_frame, dst_frame, edge_frame, out,
out_size, src_map=None, dst_map=None, edge_map=None,
out_map=None):
"""Generate v2e SPMV schedule
Parameters
----------
adj : tuple (sparse matrix, utils.Index)
spmv_pairs : list of pair
nft : var.Var
input node features
eft : var.Var
input edge features
eid : var.Var
eid index
graph : utils.CtxCachedObject
Function that generates immutable graph index on given context
mfunc : list of builtin message func
Builtin message function list
src_frame : var.Var
Input source node features
dst_frame : var.Var
Input destination node features
edge_frame : var.Var
Input edge features
out : var.Var
output node features
Output node features
out_size : int
Number of output nodes
src_map : utils.CtxCachedObject
Function that generates source node id mapping array on given context
dst_map : utils.CtxCachedObject
Function that generates destination node id mapping array on given
context
edge_map : utils.CtxCachedObject
Function that generates edge id mapping array on given context
out_map : utils.CtxCachedObject
Function that generates output id mapping array on given context
"""
adjmat, shuffle_idx = adj
adj_var = var.SPMAT(adjmat)
if shuffle_idx is not None:
new_eid = utils.reorder_index(eid.data, shuffle_idx)
eid = var.IDX(new_eid)
for mfn, rfn in spmv_pairs:
if mfn.use_edge_feature:
ftedge = ir.READ(eft, eid, var.STR(mfn.edge_field))
ftsrc = ir.READ_COL(nft, var.STR(mfn.src_field))
ftdst = ir.SPMV_WITH_DATA(adj_var, ftedge, ftsrc)
else:
ftsrc = ir.READ_COL(nft, var.STR(mfn.src_field))
ftdst = ir.SPMV(adj_var, ftsrc)
# save for merge
ir.WRITE_COL_(out, var.STR(rfn.out_field), ftdst)
for mfn in mfunc:
fmsg = mfn._invoke(graph, src_frame, dst_frame, edge_frame, out_size,
src_map, dst_map, edge_map, out_map=out_map,
reducer="none")
ir.WRITE_COL_(out, var.STR(mfn.out_field), fmsg)
def gen_e2v_spmv_schedule(inc, spmv_rfunc, mfr, out):
def gen_e2v_spmv_schedule(graph, rfunc, message_frame, out, out_size,
edge_map=None, out_map=None):
"""Generate e2v SPMV schedule.
Parameters
----------
inc : tuple (sparse matrix, utils.Index)
spmv_rfunc : list of builtin reducers
mf : var.Var
Variable for message frame.
graph : utils.CtxCachedObject
Function that generates immutable graph index on given context
rfunc : list of builtin reduce func
Builtin reduce function list
message_frame : var.Var
Message features
out : var.Var
Variable for output reduced features.
Output node features
out_size : int
Number of output nodes
edge_map : utils.CtxCachedObject
Function that generates edge id mapping array on given context
out_map : utils.CtxCachedObject
Function that generates output id mapping array on given context
"""
incmat, _ = inc
inc_var = var.SPMAT(incmat)
for rfn in spmv_rfunc:
ftmsg = ir.READ_COL(mfr, var.STR(rfn.msg_field))
ftdst = ir.SPMV(inc_var, ftmsg)
for rfn in rfunc:
ftdst = rfn._invoke(graph, message_frame, out_size, edge_map=edge_map,
out_map=out_map)
ir.WRITE_COL_(out, var.STR(rfn.out_field), ftdst)
def build_block_adj_matrix_graph(graph, block_id):
"""Build adjacency matrix of the whole graph.
Parameters
----------
graph : NodeFlow
The NodeFlow
block_id : int
the block Id
Returns
-------
utils.CtxCachedObject
Get be used to get adjacency matrix on the provided ctx.
utils.Index
A index for data shuffling due to sparse format change. Return None
if shuffle is not required.
"""
#TODO why is this constructed twice?
_, shuffle_idx = graph.block_adjacency_matrix(block_id, F.cpu())
shuffle_idx = utils.toindex(shuffle_idx) if shuffle_idx is not None else None
return lambda ctx: graph.block_adjacency_matrix(block_id, ctx)[0], shuffle_idx
def build_adj_matrix_graph(graph):
"""Build adjacency matrix of the whole graph.
def build_gidx_and_mapping_graph(graph):
"""Build immutable graph index of the whole graph.
Parameters
----------
......@@ -165,236 +132,86 @@ def build_adj_matrix_graph(graph):
Returns
-------
utils.CtxCachedObject
Get be used to get adjacency matrix on the provided ctx.
utils.Index
A index for data shuffling due to sparse format change. Return None
if shuffle is not required.
"""
gidx = graph._graph
# TODO Why invoking adjacency_matrix twice?
_, shuffle_idx = gidx.adjacency_matrix(False, F.cpu())
return lambda ctx: gidx.adjacency_matrix(False, ctx)[0], shuffle_idx
def _build_adj_matrix_index_uv(edges, reduce_nodes, num_sources):
"""Build adj matrix index and shape using the given (u, v) edges.
The matrix is of shape (len(reduce_nodes), n), where n is the number of nodes
in the graph. Therefore, when doing SPMV, the src node data
should be all the node features.
The dst nodes will be sorted in the *unique-ascending* order of
their ids. This is compatible with other reduce scheduler such as
degree-bucketing scheduler.
Paramters
---------
edges : tuple of utils.Index
(u, v)
reduce_nodes : utils.Index
The nodes to reduce messages, which will be target dimension
of the adjmat. The nodes include unique(v) and zero-degree-nodes.
num_sources : int
The number of source nodes.
Returns
-------
sparse index
The sparse index.
tupe of int
The dense shape.
"""
# TODO(minjie): add node frontier for this
_, old2new = utils.build_relabel_map(reduce_nodes, is_sorted=True)
u, v = edges
u = u.tousertensor()
v = v.tousertensor()
new_v = old2new[v] # FIXME(minjie): no use []
n = num_sources
m = len(reduce_nodes)
row = F.unsqueeze(new_v, 0)
col = F.unsqueeze(u, 0)
idx = F.cat([row, col], dim=0)
return ('coo', idx), (m, n)
def build_adj_matrix_uv(edges, reduce_nodes, num_sources):
"""Build adj matrix using the given (u, v) edges and target nodes.
The matrix is of shape (len(reduce_nodes), n), where n is the number of nodes
in the graph. Therefore, when doing SPMV, the src node data
should be all the node features.
Parameters
---------
edges : tuple of utils.Index
(u, v)
reduce_nodes : utils.Index
The nodes to reduce messages, which will be target dimension
of the adjmat. The nodes include unique(v) and zero-degree-nodes.
num_sources : int
The number of source nodes.
Returns
-------
utils.CtxCachedObject
Get be used to get adjacency matrix and on the provided ctx.
utils.Index
A index for data shuffling due to sparse format change. Return None
if shuffle is not required.
"""
sp_idx, shape = _build_adj_matrix_index_uv(edges, reduce_nodes, num_sources)
u, _ = edges
nnz = len(u)
# FIXME(minjie): data type
dat = F.ones((nnz,), dtype=F.float32, ctx=F.cpu())
mat, shuffle_idx = F.sparse_matrix(dat, sp_idx, shape)
shuffle_idx = utils.toindex(shuffle_idx) if shuffle_idx is not None else None
return utils.CtxCachedObject(lambda ctx: F.copy_to(mat, ctx)), shuffle_idx
def build_block_inc_matrix_graph(graph, block_id):
"""Build incidence matrix.
Parameters
----------
graph : NodeFlow
The NodeFlow.
block_id : int
The block Id
Returns
-------
utils.CtxCachedObject
Get be used to get incidence matrix on the provided ctx.
utils.Index
A index for data shuffling due to sparse format change. Return None
if shuffle is not required.
"""
# inc mat will not use data tensor so conversion index is not needed
return lambda ctx: graph.block_incidence_matrix(block_id, 'in', ctx)[0], None
def build_inc_matrix_graph(graph):
"""Build incidence matrix.
Parameters
----------
graph : DGLGraph
The graph.
Returns
-------
utils.CtxCachedObject
Get be used to get incidence matrix on the provided ctx.
utils.Index
A index for data shuffling due to sparse format change. Return None
if shuffle is not required.
graph : utils.CtxCachedObject
Function that generates a immutable graph index on given context
edge_map : utils.CtxCachedObject
Function that generates forward and backward edge mapping on given
context
nbits : int
Number of ints needed to represent the graph
"""
gidx = graph._graph
# inc mat will not use data tensor so conversion index is not needed
return lambda ctx: gidx.incidence_matrix('in', ctx)[0], None
def build_inc_matrix_eid(m, eid, dst, reduce_nodes):
"""Build incidence matrix using edge id and edge dst nodes.
The incidence matrix is of shape (n, m), where n=len(reduce_nodes).
The nnz is equal to len(eid).
return gidx.get_immutable_gidx, None, gidx.bits_needed()
Invariant: len(eid) == len(dst)
The dst nodes will be sorted in the *unique-ascending* order of
their ids. This is compatible with other reduce scheduler such as
degree-bucketing scheduler.
def build_gidx_and_mapping_uv(edge_tuples, num_nodes):
"""Build immutable graph index and mapping using the given (u, v) edges
Examples
--------
Total of seven edges. Three edges point to node 1 (eid=0,1,2);
two point to node 3 (eid=3,4); two point to node 4 (eid=5,6).
Only five edges should be included in the result incmat (eid=1,2,3,5,6).
There are five nodes in the final target dimension (0~4),
where node 0 and 2 are two 0-deg nodes.
>>> m = 7
>>> eid = [1, 2, 3, 5, 6]
>>> dst = [1, 1, 3, 4, 4]
>>> reduce_nodes = [0, 1, 2, 3, 4]
>>> build_inc_matrix_eid(m, eid, dst, reduce_nodes)
tensor([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1]], shape=(5, 7))
The matrix is of shape (len(reduce_nodes), n), where n is the number of
nodes in the graph. Therefore, when doing SPMV, the src node data should be
all the node features.
Parameters
---------
m : int
The source dimension size of the incidence matrix.
eid : utils.Index
The edge ids. All ids must be in range [0, m).
dst : utils.Index
The edge destination nodes. len(eid) == len(dst).
reduce_nodes : utils.Index
The nodes to reduce messages, which will be target dimension
of the incmat. The nodes include unique(dst) and zero-degree-nodes.
edge_tuples : tuple of three utils.Index
A tuple of (u, v, eid)
num_nodes : int
The number of nodes.
Returns
-------
utils.CtxCachedObject
Get be used to get incidence matrix on the provided ctx.
utils.Index
A index for data shuffling due to sparse format change. Return None
if shuffle is not required.
graph : utils.CtxCachedObject
Function that generates a immutable graph index on given context
edge_map : utils.CtxCachedObject
Function that generates forward and backward edge mapping on given
context
nbits : int
Number of ints needed to represent the graph
"""
_, old2new = utils.build_relabel_map(reduce_nodes, is_sorted=True)
dst = dst.tousertensor()
u, v, eid = edge_tuples
gidx = from_coo(num_nodes, u, v, None, True)
forward, backward = gidx.get_csr_shuffle_order()
eid = eid.tousertensor()
# relabel edges dsts
new_v = old2new[dst] # FIXME(minjie): no use []
# create sparse index tensor
n = len(reduce_nodes)
row = F.unsqueeze(new_v, 0)
col = F.unsqueeze(eid, 0)
idx = F.cat([row, col], dim=0)
# create dat tensor
nnz = len(eid)
dat = F.ones((nnz,), dtype=F.float32, ctx=F.cpu())
mat, _ = F.sparse_matrix(dat, ('coo', idx), (n, m))
# inc mat will not use data tensor so conversion index is not needed
return utils.CtxCachedObject(lambda ctx: F.copy_to(mat, ctx)), None
nbits = gidx.bits_needed()
forward_map = utils.to_nbits_int(eid[forward.tousertensor()], nbits)
backward_map = utils.to_nbits_int(eid[backward.tousertensor()], nbits)
forward_map = F.zerocopy_to_dgl_ndarray(forward_map)
backward_map = F.zerocopy_to_dgl_ndarray(backward_map)
edge_map = utils.CtxCachedObject(
lambda ctx: (nd.array(forward_map, ctx=ctx),
nd.array(backward_map, ctx=ctx)))
return gidx.get_immutable_gidx, edge_map, nbits
def build_inc_matrix_dst(dst, reduce_nodes):
"""Build incidence matrix using only edge destinations.
The incidence matrix is of shape (n, m), where n=len(reduce_nodes), m=len(dst).
The nnz is equal to len(dst).
Examples
--------
Five edges. Two edges point to node 1; one points to node 3;
two point to node 4. There are five nodes in the final
target dimension (0~4), where node 0 and 2 are two 0-deg nodes.
>>> dst = [1, 1, 3, 4, 4]
>>> reduce_nodes = [0, 1, 2, 3, 4]
>>> build_inc_matrix_dst(dst, reduced_nodes)
tensor([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 1]], shape=(5, 5))
def build_gidx_and_mapping_block(graph, block_id, edge_tuples=None):
"""Build immutable graph index and mapping for node flow
Parameters
----------
dst : utils.Index
The edge destinations.
reduce_nodes : utils.Index
The nodes to reduce messages, which will be target dimension
of the incmat. The nodes include unique(dst) and zero-degree-nodes.
graph : NodeFlow
The NodeFlow
block_id : int
the block Id
edge_tuple : tuple of three utils.Index
A tuple of (u, v, eid)
Returns
-------
utils.CtxCachedObject
Get be used to get incidence matrix on the provided ctx.
utils.Index
A index for data shuffling due to sparse format change. Return None
if shuffle is not required.
graph : utils.CtxCachedObject
Function that generates a immutable graph index on given context
edge_map : utils.CtxCachedObject
Function that generates forward and backward edge mapping on given
context
nbits : int
Number of ints needed to represent the graph
"""
eid = utils.toindex(F.arange(0, len(dst)))
return build_inc_matrix_eid(len(eid), eid, dst, reduce_nodes)
if edge_tuples is None:
u, v, eid = graph.block_edges(block_id, remap=True)
u = utils.toindex(u)
v = utils.toindex(v)
eid = utils.toindex(eid)
else:
u, v, eid = edge_tuples
num_nodes = max(graph.layer_size(block_id), graph.layer_size(block_id + 1))
gidx, edge_map, nbits = build_gidx_and_mapping_uv((u, v, eid), num_nodes)
return gidx, edge_map, nbits
......@@ -336,7 +336,7 @@ def build_relabel_map(x, is_sorted=False):
>>> n2o
[1, 3, 5, 6]
>>> o2n
[n/a, 0, n/a, 2, n/a, 3, 4]
[n/a, 0, n/a, 1, n/a, 2, 3]
"n/a" will be filled with 0
......@@ -490,6 +490,27 @@ def get_ndata_name(g, name):
name += '_'
return name
def get_edata_name(g, name):
"""Return an edge data name that does not exist in the given graph.
The given name is directly returned if it does not exist in the given graph.
Parameters
----------
g : DGLGraph
The graph.
name : str
The proposed name.
Returns
-------
str
The node data name that does not exist.
"""
while name in g.edata:
name += '_'
return name
def unwrap_to_ptr_list(wrapper):
"""Convert the internal vector wrapper to a python list of ctypes.c_void_p.
......@@ -513,3 +534,19 @@ def unwrap_to_ptr_list(wrapper):
rst = [ctypes.c_void_p(x) for x in data.contents]
_api_internal._FreeVectorWrapper(wrapper)
return rst
def to_dgl_context(ctx):
"""Convert a backend context to DGLContext"""
device_type = nd.DGLContext.STR2MASK[F.device_type(ctx)]
device_id = F.device_id(ctx)
return nd.DGLContext(device_type, device_id)
def to_nbits_int(tensor, nbits):
"""Change the dtype of integer tensor
The dtype of returned tensor uses nbits, nbits can only be 32 or 64
"""
assert(nbits in (32, 64)), "nbits can either be 32 or 64"
if nbits == 32:
return F.astype(tensor, F.int32)
else:
return F.astype(tensor, F.int64)
......@@ -25,6 +25,32 @@ IdArray Clone(IdArray arr) {
return ret;
}
IdArray AsNumBits(IdArray arr, uint8_t bits) {
if (arr->dtype.bits == bits) {
return arr;
} else {
const int64_t len = arr->shape[0];
IdArray ret = IdArray::Empty({len},
DLDataType{kDLInt, bits, 1}, DLContext{kDLCPU, 0});
if (arr->dtype.bits == 32 && bits == 64) {
const int32_t* arr_data = static_cast<int32_t*>(arr->data);
int64_t* ret_data = static_cast<int64_t*>(ret->data);
for (int64_t i = 0; i < len; ++i) {
ret_data[i] = arr_data[i];
}
} else if (arr->dtype.bits == 64 && bits == 32) {
const int64_t* arr_data = static_cast<int64_t*>(arr->data);
int32_t* ret_data = static_cast<int32_t*>(ret->data);
for (int64_t i = 0; i < len; ++i) {
ret_data[i] = arr_data[i];
}
} else {
LOG(FATAL) << "Invalid type conversion.";
}
return ret;
}
}
IdArray Add(IdArray lhs, IdArray rhs) {
IdArray ret = NewIdArray(lhs->shape[0]);
const dgl_id_t* lhs_data = static_cast<dgl_id_t*>(lhs->data);
......
......@@ -12,6 +12,18 @@
#include <algorithm>
#include <vector>
namespace {
/*! \brief Check whether two device contexts are the same.*/
inline bool operator == (const DLContext& ctx1, const DLContext& ctx2) {
return ctx1.device_type == ctx2.device_type && ctx1.device_id == ctx2.device_id;
}
/*! \brief Output the string representation of device context.*/
inline std::ostream& operator << (std::ostream& os, const DLContext& ctx) {
return os << "" << ctx.device_type << ":" << ctx.device_id << "";
}
} // namespace
namespace dgl {
// Graph handler type
......
......@@ -137,15 +137,11 @@ DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLGraphCreate")
const bool readonly = args[4];
GraphHandle ghandle;
if (readonly) {
// TODO(minjie): The array copy here is unnecessary and adds extra overhead.
// However, with MXNet backend, the memory would be corrupted if we directly
// save the passed-in ndarrays into DGL's graph object. We hope MXNet team
// could help look into this.
if (multigraph == kBoolUnknown) {
COOPtr coo(new COO(num_nodes, Clone(src_ids), Clone(dst_ids)));
COOPtr coo(new COO(num_nodes, src_ids, dst_ids));
ghandle = new ImmutableGraph(coo);
} else {
COOPtr coo(new COO(num_nodes, Clone(src_ids), Clone(dst_ids), multigraph));
COOPtr coo(new COO(num_nodes, src_ids, dst_ids, multigraph));
ghandle = new ImmutableGraph(coo);
}
} else {
......@@ -170,14 +166,10 @@ DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLGraphCSRCreate")
for (size_t i = 0; i < edge_ids->shape[0]; i++)
edge_data[i] = i;
if (shared_mem_name.empty()) {
// TODO(minjie): The array copy here is unnecessary and adds extra overhead.
// However, with MXNet backend, the memory would be corrupted if we directly
// save the passed-in ndarrays into DGL's graph object. We hope MXNet team
// could help look into this.
if (multigraph == kBoolUnknown) {
csr.reset(new CSR(Clone(indptr), Clone(indices), Clone(edge_ids)));
csr.reset(new CSR(indptr, indices, edge_ids));
} else {
csr.reset(new CSR(Clone(indptr), Clone(indices), Clone(edge_ids), multigraph));
csr.reset(new CSR(indptr, indices, edge_ids, multigraph));
}
} else {
if (multigraph == kBoolUnknown) {
......@@ -200,7 +192,7 @@ DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLGraphCSRCreateMMap")
const std::string shared_mem_name = args[0];
const int64_t num_vertices = args[1];
const int64_t num_edges = args[2];
const bool multigraph = static_cast<bool>(args[3]);
const bool multigraph = args[3];
const std::string edge_dir = args[4];
// TODO(minjie): how to know multigraph
CSRPtr csr(new CSR(shared_mem_name, num_vertices, num_edges, multigraph));
......@@ -523,6 +515,54 @@ DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLGraphGetAdj")
*rv = ConvertAdjToPackedFunc(res);
});
DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLToImmutable")
.set_body([] (DGLArgs args, DGLRetValue* rv) {
GraphHandle ghandle = args[0];
const GraphInterface *ptr = static_cast<GraphInterface *>(ghandle);
GraphHandle newhandle = new ImmutableGraph(ImmutableGraph::ToImmutable(ptr));
*rv = newhandle;
});
DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLGraphContext")
.set_body([] (DGLArgs args, DGLRetValue* rv) {
GraphHandle ghandle = args[0];
const GraphInterface *ptr = static_cast<GraphInterface *>(ghandle);
*rv = ptr->Context();
});
DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLImmutableGraphCopyTo")
.set_body([] (DGLArgs args, DGLRetValue* rv) {
GraphHandle ghandle = args[0];
const int device_type = args[1];
const int device_id = args[2];
DLContext ctx;
ctx.device_type = static_cast<DLDeviceType>(device_type);
ctx.device_id = device_id;
const GraphInterface *ptr = static_cast<GraphInterface *>(ghandle);
const ImmutableGraph *ig = dynamic_cast<const ImmutableGraph*>(ptr);
CHECK(ig) << "Invalid argument: must be an immutable graph object.";
GraphHandle newhandle = new ImmutableGraph(ig->CopyTo(ctx));
*rv = newhandle;
});
DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLGraphNumBits")
.set_body([] (DGLArgs args, DGLRetValue* rv) {
GraphHandle ghandle = args[0];
const GraphInterface *ptr = static_cast<GraphInterface *>(ghandle);
*rv = ptr->NumBits();
});
DGL_REGISTER_GLOBAL("graph_index._CAPI_DGLImmutableGraphAsNumBits")
.set_body([] (DGLArgs args, DGLRetValue* rv) {
GraphHandle ghandle = args[0];
int bits = args[1];
const GraphInterface *ptr = static_cast<GraphInterface *>(ghandle);
const ImmutableGraph *ig = dynamic_cast<const ImmutableGraph*>(ptr);
CHECK(ig) << "Invalid argument: must be an immutable graph object.";
GraphHandle newhandle = new ImmutableGraph(ig->AsNumBits(bits));
*rv = newhandle;
});
DGL_REGISTER_GLOBAL("transform._CAPI_DGLToSimpleGraph")
.set_body([] (DGLArgs args, DGLRetValue* rv) {
GraphHandle ghandle = args[0];
......
......@@ -455,6 +455,34 @@ COOPtr CSR::ToCOO() const {
return COOPtr(new COO(NumVertices(), ret_src, ret_dst));
}
CSR CSR::CopyTo(const DLContext& ctx) const {
if (Context() == ctx) {
return *this;
} else {
// TODO(minjie): change to use constructor later
CSR ret;
ret.indptr_ = indptr_.CopyTo(ctx);
ret.indices_ = indices_.CopyTo(ctx);
ret.edge_ids_ = edge_ids_.CopyTo(ctx);
ret.is_multigraph_ = is_multigraph_;
return ret;
}
}
CSR CSR::AsNumBits(uint8_t bits) const {
if (NumBits() == bits) {
return *this;
} else {
// TODO(minjie): change to use constructor later
CSR ret;
ret.indptr_ = dgl::AsNumBits(indptr_, bits);
ret.indices_ = dgl::AsNumBits(indices_, bits);
ret.edge_ids_ = dgl::AsNumBits(edge_ids_, bits);
ret.is_multigraph_ = is_multigraph_;
return ret;
}
}
//////////////////////////////////////////////////////////
//
// COO graph implementation
......@@ -604,6 +632,34 @@ CSRPtr COO::ToCSR() const {
return CSRPtr(new CSR(indptr, indices, edge_ids));
}
COO COO::CopyTo(const DLContext& ctx) const {
if (Context() == ctx) {
return *this;
} else {
// TODO(minjie): change to use constructor later
COO ret;
ret.num_vertices_ = num_vertices_;
ret.src_ = src_.CopyTo(ctx);
ret.dst_ = dst_.CopyTo(ctx);
ret.is_multigraph_ = is_multigraph_;
return ret;
}
}
COO COO::AsNumBits(uint8_t bits) const {
if (NumBits() == bits) {
return *this;
} else {
// TODO(minjie): change to use constructor later
COO ret;
ret.num_vertices_ = num_vertices_;
ret.src_ = dgl::AsNumBits(src_, bits);
ret.dst_ = dgl::AsNumBits(dst_, bits);
ret.is_multigraph_ = is_multigraph_;
return ret;
}
}
//////////////////////////////////////////////////////////
//
// immutable graph implementation
......@@ -666,4 +722,42 @@ std::vector<IdArray> ImmutableGraph::GetAdj(bool transpose, const std::string &f
}
}
ImmutableGraph ImmutableGraph::ToImmutable(const GraphInterface* graph) {
const ImmutableGraph* ig = dynamic_cast<const ImmutableGraph*>(graph);
if (ig) {
return *ig;
} else {
const auto& adj = graph->GetAdj(true, "csr");
CSRPtr csr(new CSR(adj[0], adj[1], adj[2]));
return ImmutableGraph(nullptr, csr);
}
}
ImmutableGraph ImmutableGraph::CopyTo(const DLContext& ctx) const {
if (ctx == Context()) {
return *this;
}
// TODO(minjie): since we don't have GPU implementation of COO<->CSR,
// we make sure that this graph (on CPU) has materialized CSR,
// and then copy them to other context (usually GPU). This should
// be fixed later.
CSRPtr new_incsr = CSRPtr(new CSR(GetInCSR()->CopyTo(ctx)));
CSRPtr new_outcsr = CSRPtr(new CSR(GetOutCSR()->CopyTo(ctx)));
return ImmutableGraph(new_incsr, new_outcsr, nullptr);
}
ImmutableGraph ImmutableGraph::AsNumBits(uint8_t bits) const {
if (NumBits() == bits) {
return *this;
} else {
// TODO(minjie): since we don't have int32 operations,
// we make sure that this graph (on CPU) has materialized CSR,
// and then copy them to other context (usually GPU). This should
// be fixed later.
CSRPtr new_incsr = CSRPtr(new CSR(GetInCSR()->AsNumBits(bits)));
CSRPtr new_outcsr = CSRPtr(new CSR(GetOutCSR()->AsNumBits(bits)));
return ImmutableGraph(new_incsr, new_outcsr, nullptr);
}
}
} // namespace dgl
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment