• Lingfan Yu's avatar
    [Feature][Kernel] DGL kernel support (#596) · 653428bd
    Lingfan Yu authored
    * [Kernel] Minigun integration and fused kernel support (#519)
    
    * kernel interface
    
    * add minigun
    
    * Add cuda build
    
    * functors
    
    * working on binary elewise
    
    * binary reduce
    
    * change kernel interface
    
    * WIP
    
    * wip
    
    * fix minigun
    
    * compile
    
    * binary reduce kernels
    
    * compile
    
    * simple test passed
    
    * more reducers
    
    * fix thrust problem
    
    * fix cmake
    
    * fix cmake; add proper guard for atomic
    
    * WIP: bcast
    
    * WIP
    
    * bcast kernels
    
    * update to new minigun pass-by-value practice
    
    * broadcasting dim
    
    * add copy src and copy edge
    
    * fix linking
    
    * fix none array problem
    
    * fix copy edge
    
    * add device_type and device_id to backend operator
    
    * cache csr adj, remove cache for adjmat and incmat
    
    * custom ops in backend and pytorch impl
    
    * change dgl-mg kernel python interface
    
    * add id_mapping var
    
    * clean up plus v2e spmv schedule
    
    * spmv schedule & clean up fall back
    
    * symbolic message and reduce func, remove bundle func
    
    * new executors
    
    * new backend interface for dgl kernels and pytorch impl
    
    * minor fix
    
    * fix
    
    * fix docstring, comments, func names
    
    * nodeflow
    
    * fix message id mapping and bugs...
    
    * pytorch test case & fix
    
    * backward binary reduce
    
    * fix bug
    
    * WIP: cusparse
    
    * change to int32 csr for cusparse workaround
    
    * disable cusparse
    
    * change back to int64
    
    * broadcasting backward
    
    * cusparse; WIP: add rev_csr
    
    * unit test for kernels
    
    * pytorch backward with dgl kernel
    
    * edge softmax
    
    * fix backward
    
    * improve softmax
    
    * cache edge on device
    
    * cache mappings on device
    
    * fix partial forward code
    
    * cusparse done
    
    * copy_src_sum with cusparse
    
    * rm id getter
    
    * reduce grad for broadcast
    
    * copy edge reduce backward
    
    * kernel unit test for broadcasting
    
    * full kernel unit test
    
    * add cpu kernels
    
    * edge softmax unit test
    
    * missing ref
    
    * fix compile and small bugs
    
    * fix bug in bcast
    
    * Add backward both
    
    * fix torch utests
    
    * expose infershape
    
    * create out tensor in python
    
    * fix c++ lint
    
    * [Kernel] Add GPU utest and kernel utest (#524)
    
    * fix gpu utest
    
    * cuda utest runnable
    
    * temp disable test nodeflow; unified test for kernel
    
    * cuda test kernel done
    
    * [Kernel] Update kernel branch (#550)
    
    * [Model] add multiprocessing training with sampling. (#484)
    
    * reorganize sampling code.
    
    * add multi-process training.
    
    * speed up gcn_cv
    
    * fix graphsage_cv.
    
    * add new API in graph store.
    
    * update barrier impl.
    
    * support both local and distributed training.
    
    * fix multiprocess train.
    
    * fix.
    
    * fix barrier.
    
    * add script for loading data.
    
    * multiprocessing sampling.
    
    * accel training.
    
    * replace pull with spmv for speedup.
    
    * nodeflow copy from parent with context.
    
    * enable GPU.
    
    * fix a bug in graph store.
    
    * enable multi-GPU training.
    
    * fix lint.
    
    * add comments.
    
    * rename to run_store_server.py
    
    * fix gcn_cv.
    
    * fix a minor bug in sampler.
    
    * handle error better in graph store.
    
    * improve graphsage_cv for distributed mode.
    
    * update README.
    
    * fix.
    
    * update.
    
    * [Tutorial] add sampling tutorial. (#522)
    
    * add sampling tutorial.
    
    * add readme
    
    * update author list.
    
    * fix indent in the code.
    
    * rename the file.
    
    * update tutorial.
    
    * fix the last API.
    
    * update image.
    
    * [BUGFIX] fix the problems in the sampling tutorial. (#523)
    
    * add index.
    
    * update.
    
    * update tutorial.
    
    * fix gpu utest
    
    * cuda utest runnable
    
    * temp disable test nodeflow; unified test for kernel
    
    * cuda test kernel done
    
    * Fixing typo in JTNN after interface change (#536)
    
    * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515)
    
    * [Bug Fix] Fix inplace op at backend (#546)
    
    * Fix inplace operation
    
    * fix line seprator
    
    * [Feature] Add batch and unbatch for immutable graph (#539)
    
    * Add batch and unbatch for immutable graph
    
    * fix line seprator
    
    * fix lintr
    
    * remove unnecessary include
    
    * fix code review
    
    * [BUGFix] Improve multi-processing training (#526)
    
    * fix.
    
    * add comment.
    
    * remove.
    
    * temp fix.
    
    * initialize for shared memory.
    
    * fix graphsage.
    
    * fix gcn.
    
    * add more unit tests.
    
    * add more tests.
    
    * avoid creating shared-memory exclusively.
    
    * redefine remote initializer.
    
    * improve initializer.
    
    * fix unit test.
    
    * fix lint.
    
    * fix lint.
    
    * initialize data in the graph store server properly.
    
    * fix test.
    
    * fix test.
    
    * fix test.
    
    * small fix.
    
    * add comments.
    
    * cleanup server.
    
    * test graph store with a random port.
    
    * print.
    
    * print to stderr.
    
    * test1
    
    * test2
    
    * remove comment.
    
    * adjust the initializer signature.
    
    * [API] update graph store API. (#549)
    
    * add init_ndata and init_edata in DGLGraph.
    
    * adjust SharedMemoryGraph API.
    
    * print warning.
    
    * fix comment.
    
    * update example
    
    * fix.
    
    * fix examples.
    
    * add unit tests.
    
    * add comments.
    
    * [Refactor] Immutable graph index (#543)
    
    * WIP
    
    * header
    
    * WIP .cc
    
    * WIP
    
    * transpose
    
    * wip
    
    * immutable graph .h and .cc
    
    * WIP: nodeflow.cc
    
    * compile
    
    * remove all tmp dl managed ctx; they caused refcount issue
    
    * one simple test
    
    * WIP: testing
    
    * test_graph
    
    * fix graph index
    
    * fix bug in sampler; pass pytorch utest
    
    * WIP on mxnet
    
    * fix lint
    
    * fix mxnet unittest w/ unfortunate workaround
    
    * fix msvc
    
    * fix lint
    
    * SliceRows and test_nodeflow
    
    * resolve reviews
    
    * resolve reviews
    
    * try fix win ci
    
    * try fix win ci
    
    * poke win ci again
    
    * poke
    
    * lazy multigraph flag; stackoverflow error
    
    * revert node subgraph test
    
    * lazy object
    
    * try fix win build
    
    * try fix win build
    
    * poke ci
    
    * fix build script
    
    * fix compile
    
    * add a todo
    
    * fix reviews
    
    * fix compile
    
    * [Kernel] Update kernel branch (#576)
    
    * [Model] add multiprocessing training with sampling. (#484)
    
    * reorganize sampling code.
    
    * add multi-process training.
    
    * speed up gcn_cv
    
    * fix graphsage_cv.
    
    * add new API in graph store.
    
    * update barrier impl.
    
    * support both local and distributed training.
    
    * fix multiprocess train.
    
    * fix.
    
    * fix barrier.
    
    * add script for loading data.
    
    * multiprocessing sampling.
    
    * accel training.
    
    * replace pull with spmv for speedup.
    
    * nodeflow copy from parent with context.
    
    * enable GPU.
    
    * fix a bug in graph store.
    
    * enable multi-GPU training.
    
    * fix lint.
    
    * add comments.
    
    * rename to run_store_server.py
    
    * fix gcn_cv.
    
    * fix a minor bug in sampler.
    
    * handle error better in graph store.
    
    * improve graphsage_cv for distributed mode.
    
    * update README.
    
    * fix.
    
    * update.
    
    * [Tutorial] add sampling tutorial. (#522)
    
    * add sampling tutorial.
    
    * add readme
    
    * update author list.
    
    * fix indent in the code.
    
    * rename the file.
    
    * update tutorial.
    
    * fix the last API.
    
    * update image.
    
    * [BUGFIX] fix the problems in the sampling tutorial. (#523)
    
    * add index.
    
    * update.
    
    * update tutorial.
    
    * fix gpu utest
    
    * cuda utest runnable
    
    * temp disable test nodeflow; unified test for kernel
    
    * cuda test kernel done
    
    * Fixing typo in JTNN after interface change (#536)
    
    * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515)
    
    * [Bug Fix] Fix inplace op at backend (#546)
    
    * Fix inplace operation
    
    * fix line seprator
    
    * [Feature] Add batch and unbatch for immutable graph (#539)
    
    * Add batch and unbatch for immutable graph
    
    * fix line seprator
    
    * fix lintr
    
    * remove unnecessary include
    
    * fix code review
    
    * [BUGFix] Improve multi-processing training (#526)
    
    * fix.
    
    * add comment.
    
    * remove.
    
    * temp fix.
    
    * initialize for shared memory.
    
    * fix graphsage.
    
    * fix gcn.
    
    * add more unit tests.
    
    * add more tests.
    
    * avoid creating shared-memory exclusively.
    
    * redefine remote initializer.
    
    * improve initializer.
    
    * fix unit test.
    
    * fix lint.
    
    * fix lint.
    
    * initialize data in the graph store server properly.
    
    * fix test.
    
    * fix test.
    
    * fix test.
    
    * small fix.
    
    * add comments.
    
    * cleanup server.
    
    * test graph store with a random port.
    
    * print.
    
    * print to stderr.
    
    * test1
    
    * test2
    
    * remove comment.
    
    * adjust the initializer signature.
    
    * [API] update graph store API. (#549)
    
    * add init_ndata and init_edata in DGLGraph.
    
    * adjust SharedMemoryGraph API.
    
    * print warning.
    
    * fix comment.
    
    * update example
    
    * fix.
    
    * fix examples.
    
    * add unit tests.
    
    * add comments.
    
    * [Refactor] Immutable graph index (#543)
    
    * WIP
    
    * header
    
    * WIP .cc
    
    * WIP
    
    * transpose
    
    * wip
    
    * immutable graph .h and .cc
    
    * WIP: nodeflow.cc
    
    * compile
    
    * remove all tmp dl managed ctx; they caused refcount issue
    
    * one simple test
    
    * WIP: testing
    
    * test_graph
    
    * fix graph index
    
    * fix bug in sampler; pass pytorch utest
    
    * WIP on mxnet
    
    * fix lint
    
    * fix mxnet unittest w/ unfortunate workaround
    
    * fix msvc
    
    * fix lint
    
    * SliceRows and test_nodeflow
    
    * resolve reviews
    
    * resolve reviews
    
    * try fix win ci
    
    * try fix win ci
    
    * poke win ci again
    
    * poke
    
    * lazy multigraph flag; stackoverflow error
    
    * revert node subgraph test
    
    * lazy object
    
    * try fix win build
    
    * try fix win build
    
    * poke ci
    
    * fix build script
    
    * fix compile
    
    * add a todo
    
    * fix reviews
    
    * fix compile
    
    * all demo use python-3 (#555)
    
    * [DEMO] Reproduce numbers of distributed training in AMLC giant graph paper (#556)
    
    * update
    
    * update
    
    * update
    
    * update num_hops
    
    * fix bug
    
    * update
    
    * report numbers of distributed training in AMLC giant graph paper
    
    * [DEMO] Remove duplicate code for sampling (#557)
    
    * update
    
    * update
    
    * re-use single-machine code
    
    * update
    
    * use relative path
    
    * update
    
    * update
    
    * update
    
    * add __init__.py
    
    * add __init__.py
    
    * import sys, os
    
    * fix typo
    
    * update
    
    * [Perf] Improve performance of graph store. (#554)
    
    * fix.
    
    * use inplace.
    
    * move to shared memory graph store.
    
    * fix.
    
    * add more unit tests.
    
    * fix.
    
    * fix test.
    
    * fix test.
    
    * disable test.
    
    * fix.
    
    * [BUGIFX] fix a bug in edge_ids (#560)
    
    * add test.
    
    * fix compute.
    
    * fix test.
    
    * turn on test.
    
    * fix a bug.
    
    * add test.
    
    * fix.
    
    * disable test.
    
    * [DEMO] Add Pytorch demo for distributed sampler (#562)
    
    * update
    
    * update
    
    * update
    
    * add sender
    
    * update
    
    * remove duplicate cpde
    
    * [Test] Add gtest to project (#547)
    
    * add gtest module
    
    * add gtest
    
    * fix
    
    * Update CMakeLists.txt
    
    * Update README.md
    
    * [Perf] lazily create msg_index. (#563)
    
    * lazily create msg_index.
    
    * update test.
    
    * [BUGFIX] fix bugs for running GCN on giant graphs. (#561)
    
    * load mxnet csr.
    
    * enable load large csr.
    
    * fix
    
    * fix.
    
    * fix int overflow.
    
    * fix test.
    
    * [BugFix] Fix error when bfs_level = 0 in Entity Classification with RGCN (#559)
    
    * [DEMO] Update demo of distributed sampler (#564)
    
    * update
    
    * update
    
    * update demo
    
    * add network cpp test (#565)
    
    * Add unittest for C++ RPC (#566)
    
    * [CI] Fix CI for cpp test (#570)
    
    * fix CI for cpp test
    
    * update port number
    
    * [Docker] update docker image (#575)
    
    * update docker image
    
    * specify lint version
    
    * rm torch import from unified tests
    
    * [Kernel][Scheduler][MXNet] Scheduler for DGL kernels and MXNet backend support (#541)
    
    * [Model] add multiprocessing training with sampling. (#484)
    
    * reorganize sampling code.
    
    * add multi-process training.
    
    * speed up gcn_cv
    
    * fix graphsage_cv.
    
    * add new API in graph store.
    
    * update barrier impl.
    
    * support both local and distributed training.
    
    * fix multiprocess train.
    
    * fix.
    
    * fix barrier.
    
    * add script for loading data.
    
    * multiprocessing sampling.
    
    * accel training.
    
    * replace pull with spmv for speedup.
    
    * nodeflow copy from parent with context.
    
    * enable GPU.
    
    * fix a bug in graph store.
    
    * enable multi-GPU training.
    
    * fix lint.
    
    * add comments.
    
    * rename to run_store_server.py
    
    * fix gcn_cv.
    
    * fix a minor bug in sampler.
    
    * handle error better in graph store.
    
    * improve graphsage_cv for distributed mode.
    
    * update README.
    
    * fix.
    
    * update.
    
    * [Tutorial] add sampling tutorial. (#522)
    
    * add sampling tutorial.
    
    * add readme
    
    * update author list.
    
    * fix indent in the code.
    
    * rename the file.
    
    * update tutorial.
    
    * fix the last API.
    
    * update image.
    
    * [BUGFIX] fix the problems in the sampling tutorial. (#523)
    
    * add index.
    
    * update.
    
    * update tutorial.
    
    * fix gpu utest
    
    * cuda utest runnable
    
    * temp disable test nodeflow; unified test for kernel
    
    * cuda test kernel done
    
    * edge softmax module
    
    * WIP
    
    * Fixing typo in JTNN after interface change (#536)
    
    * mxnet backend support
    
    * improve reduce grad
    
    * add max to unittest backend
    
    * fix kernel unittest
    
    * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515)
    
    * lint
    
    * lint
    
    * win build
    
    * [Bug Fix] Fix inplace op at backend (#546)
    
    * Fix inplace operation
    
    * fix line seprator
    
    * [Feature] Add batch and unbatch for immutable graph (#539)
    
    * Add batch and unbatch for immutable graph
    
    * fix line seprator
    
    * fix lintr
    
    * remove unnecessary include
    
    * fix code review
    
    * [BUGFix] Improve multi-processing training (#526)
    
    * fix.
    
    * add comment.
    
    * remove.
    
    * temp fix.
    
    * initialize for shared memory.
    
    * fix graphsage.
    
    * fix gcn.
    
    * add more unit tests.
    
    * add more tests.
    
    * avoid creating shared-memory exclusively.
    
    * redefine remote initializer.
    
    * improve initializer.
    
    * fix unit test.
    
    * fix lint.
    
    * fix lint.
    
    * initialize data in the graph store server properly.
    
    * fix test.
    
    * fix test.
    
    * fix test.
    
    * small fix.
    
    * add comments.
    
    * cleanup server.
    
    * test graph store with a random port.
    
    * print.
    
    * print to stderr.
    
    * test1
    
    * test2
    
    * remove comment.
    
    * adjust the initializer signature.
    
    * try
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * try
    
    * test
    
    * test
    
    * test
    
    * try
    
    * try
    
    * try
    
    * test
    
    * fix
    
    * try gen_target
    
    * fix gen_target
    
    * fix msvc var_args expand issue
    
    * fix
    
    * [API] update graph store API. (#549)
    
    * add init_ndata and init_edata in DGLGraph.
    
    * adjust SharedMemoryGraph API.
    
    * print warning.
    
    * fix comment.
    
    * update example
    
    * fix.
    
    * fix examples.
    
    * add unit tests.
    
    * add comments.
    
    * [Refactor] Immutable graph index (#543)
    
    * WIP
    
    * header
    
    * WIP .cc
    
    * WIP
    
    * transpose
    
    * wip
    
    * immutable graph .h and .cc
    
    * WIP: nodeflow.cc
    
    * compile
    
    * remove all tmp dl managed ctx; they caused refcount issue
    
    * one simple test
    
    * WIP: testing
    
    * test_graph
    
    * fix graph index
    
    * fix bug in sampler; pass pytorch utest
    
    * WIP on mxnet
    
    * fix lint
    
    * fix mxnet unittest w/ unfortunate workaround
    
    * fix msvc
    
    * fix lint
    
    * SliceRows and test_nodeflow
    
    * resolve reviews
    
    * resolve reviews
    
    * try fix win ci
    
    * try fix win ci
    
    * poke win ci again
    
    * poke
    
    * lazy multigraph flag; stackoverflow error
    
    * revert node subgraph test
    
    * lazy object
    
    * try fix win build
    
    * try fix win build
    
    * poke ci
    
    * fix build script
    
    * fix compile
    
    * add a todo
    
    * fix reviews
    
    * fix compile
    
    * WIP
    
    * WIP
    
    * all demo use python-3 (#555)
    
    * ToImmutable and CopyTo
    
    * [DEMO] Reproduce numbers of distributed training in AMLC giant graph paper (#556)
    
    * update
    
    * update
    
    * update
    
    * update num_hops
    
    * fix bug
    
    * update
    
    * report numbers of distributed training in AMLC giant graph paper
    
    * [DEMO] Remove duplicate code for sampling (#557)
    
    * update
    
    * update
    
    * re-use single-machine code
    
    * update
    
    * use relative path
    
    * update
    
    * update
    
    * update
    
    * add __init__.py
    
    * add __init__.py
    
    * import sys, os
    
    * fix typo
    
    * update
    
    * [Perf] Improve performance of graph store. (#554)
    
    * fix.
    
    * use inplace.
    
    * move to shared memory graph store.
    
    * fix.
    
    * add more unit tests.
    
    * fix.
    
    * fix test.
    
    * fix test.
    
    * disable test.
    
    * fix.
    
    * [BUGIFX] fix a bug in edge_ids (#560)
    
    * add test.
    
    * fix compute.
    
    * fix test.
    
    * turn on test.
    
    * fix a bug.
    
    * add test.
    
    * fix.
    
    * disable test.
    
    * DGLRetValue DGLContext conversion
    
    * [DEMO] Add Pytorch demo for distributed sampler (#562)
    
    * update
    
    * update
    
    * update
    
    * add sender
    
    * update
    
    * remove duplicate cpde
    
    * [Test] Add gtest to project (#547)
    
    * add gtest module
    
    * add gtest
    
    * fix
    
    * Update CMakeLists.txt
    
    * Update README.md
    
    * Add support to convert immutable graph to 32 bits
    
    * [Perf] lazily create msg_index. (#563)
    
    * lazily create msg_index.
    
    * update test.
    
    * fix binary reduce following new minigun template
    
    * enable both int64 and int32 kernels
    
    * [BUGFIX] fix bugs for running GCN on giant graphs. (#561)
    
    * load mxnet csr.
    
    * enable load large csr.
    
    * fix
    
    * fix.
    
    * fix int overflow.
    
    * fix test.
    
    * new kernel interface done for CPU
    
    * docstring
    
    * rename & docstring
    
    * copy reduce and backward
    
    * [BugFix] Fix error when bfs_level = 0 in Entity Classification with RGCN (#559)
    
    * [DEMO] Update demo of distributed sampler (#564)
    
    * update
    
    * update
    
    * update demo
    
    * adapt cuda kernels to the new interface
    
    * add network cpp test (#565)
    
    * fix bug
    
    * Add unittest for C++ RPC (#566)
    
    * [CI] Fix CI for cpp test (#570)
    
    * fix CI for cpp test
    
    * update port number
    
    * [Docker] update docker image (#575)
    
    * update docker image
    
    * specify lint version
    
    * rm torch import from unified tests
    
    * remove pytorch-specific test_function
    
    * fix unittest
    
    * fix
    
    * fix unittest backend bug in converting tensor to numpy array
    
    * fix
    
    * mxnet version
    
    * [BUGFIX] fix for MXNet 1.5. (#552)
    
    * remove clone.
    
    * turn on numpy compatible.
    
    * Revert "remove clone."
    
    This reverts commit 17bbf76ed72ff178df6b3f35addc428048672457.
    
    * revert format changes
    
    * fix mxnet api name
    
    * revert mistakes in previous revert
    
    * roll back CI to 20190523 build
    
    * fix unittest
    
    * disable test_shared_mem_store.py for now
    
    * remove mxnet/test_specialization.py
    
    * sync win64 test script
    
    * fix lowercase
    
    * missing backend in gpu unit test
    
    * transpose to get forward graph
    
    * pass update all
    
    * add sanity check
    
    * passing test_specialization.py
    
    * fix and pass test_function
    
    * fix check
    
    * fix pytorch softmax
    
    * mxnet kernels
    
    * c++ lint
    
    * pylint
    
    * try
    
    * win build
    
    * fix
    
    * win
    
    * ci enable gpu build
    
    * init submodule recursively
    
    * backend docstring
    
    * try
    
    * test win dev
    
    * doc string
    
    * disable pytorch test_nn
    
    * try to fix windows issue
    
    * bug fixed, revert changes
    
    * [Test] fix CI. (#586)
    
    * disable unit test in mxnet tutorial.
    
    * retry socket connection.
    
    * roll back to set_np_compat
    
    * try to fix multi-processing test hangs when it fails.
    
    * fix test.
    
    * fix.
    
    * doc string
    
    * doc string and clean up
    
    * missing field in ctypes
    
    * fix node flow schedule and unit test
    
    * rename
    
    * pylint
    
    * copy from parent default context
    
    * fix unit test script
    
    * fix
    
    * demo bug in nodeflow gpu test
    
    * [Kernel][Bugfix] fix nodeflow bug (#604)
    
    * fix nodeflow bug
    
    * remove debug code
    
    * add build gtest option
    
    * fix cmake; fix graph index bug in spmv.py
    
    * remove clone
    
    * fix div rhs grad bug
    
    * [Kernel] Support full builtin method, edge softmax and unit tests (#605)
    
    * add full builtin support
    
    * unit test
    
    * unit test backend
    
    * edge softmax
    
    * apply edge with builtin
    
    * fix kernel unit test
    
    * disable mxnet test_shared_mem_store
    
    * gen builtin reduce
    
    * enable mxnet gpu unittest
    
    * revert some changes
    
    * docstring
    
    * add note for the hack
    
    * [Kernel][Unittest][CI] Fix MXNet GPU CI (#607)
    
    * update docker image for MXNet GPU CI
    
    * force all dgl graph input and output on CPU
    
    * fix gpu unittest
    
    * speedup compilation
    
    * add some comments
    
    * lint
    
    * add more comments
    
    * fix as requested
    
    * add some comments
    
    * comment
    
    * lint
    
    * lint
    
    * update pylint
    
    * fix as requested
    
    * lint
    
    * lint
    
    * lint
    
    * docstrings of python DGL kernel entries
    
    * disable lint warnings on arguments in kernel.py
    
    * fix docstring in scheduler
    
    * fix some bug in unittest; try again
    
    * Revert "Merge branch 'kernel' of github.com:zzhang-cn/dgl into kernel"
    
    This reverts commit 1d2299e68b004182ea6130b088de1f1122b18a49, reversing
    changes made to ddc97fbf1bec2b7815c0da7c74f7ecb2f428889b.
    
    * Revert "fix some bug in unittest; try again"
    
    This reverts commit ddc97fbf1bec2b7815c0da7c74f7ecb2f428889b.
    
    * more comprehensive kernel test
    
    * remove shape check in test_specialization
    653428bd
task_unit_test.sh 578 Bytes