- 29 Sep, 2021 1 commit
-
-
Rhett Ying authored
* [Feature] enable create/set/free cuda stream for internal use * add unit test * fix unit test failure on mxnet and tf * refactor stream wrapper * fix lint error * fix lint error
-
- 06 Sep, 2021 1 commit
-
-
Jinjing Zhou authored
* remove * remove * fix * remove * remove
-
- 19 Aug, 2021 1 commit
-
-
nv-dlasalle authored
* Update filter code * Add unit tests * Fixes * Switch to indices * Rename functions * Fix linting * Fix whitespace * Add doc * Fix heterograph * Change workspace allocation * Fix linting * Fix docs in filter.py * Add todo Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Quan (Andy) Gan <coin2028@hotmail.com>
-
- 16 Jul, 2021 1 commit
-
-
David Min authored
[Feature][Performance][GPU] Introducing UnifiedTensor for efficient zero-copy host memory access from GPU (#3086) * Add pytorch-direct version * Initial commit of unified tensor * Merge branch 'master' of https://github.com/davidmin7/dgl * Remove unnecessary things * Fix error message * Fix/Add descriptions * whitespace fix * add unpin * disable IndexSelectCPUFromGPU with no CUDA * add a newline for unified_tensor.py * Apply changes based on feedback * add 'os' module * skip unified tensor unit test for cpu only * Update tests/pytorch/test_unified_tensor.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * reflect feedback Co-authored-by:
shhssdm <shhssdm@gmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 27 Jun, 2021 1 commit
-
-
Jinjing Zhou authored
* fix * remove nvidiasmi * fix * fix docs * fix * fix * 1 * fix * remove * skip deprecated kernel * fix * Revert "skip deprecated kernel" This reverts commit c5ceb7f60dbbaf065b81cc3680757fd611d90ad3. * fix
-
- 23 Jun, 2021 1 commit
-
-
nv-dlasalle authored
Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 11 Jun, 2021 1 commit
-
-
nv-dlasalle authored
* Split from NCCL PR * Fix type in comment * Expand documentation for sparse_all_to_all_push * Restore previous behavior in example * Re-work optimizer to use NCCL based on gradient location * Allow for running with embedding on CPU but using NCCL for gradient exchange * Optimize single partition case * Fix pylint errors * Add missing include * fix gradient indexing * Fix line continuation * Migrate 'first_step' * Skip tests without enough GPUs to run NCCL * Improve empty tensor handling for pytorch 1.5 * Fix indentation * Allow multiple NCCL communicator to coexist * Improve handling of empty message * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Keepy empty tensor dimensionaless * th.empty -> th.tensor * Preserve shape for empty non-zero dimension tensors * Use shared state, when embedding is shared * Add support for gathering an embedding * Fix typo * Fix more typos * Fix backend call * Use NodeDataLoader to take advantage of ddp * Update training script to share memory * Only squeeze last dimension * Better handle empty message * Keep embedding on the target device GPU if dgl_sparse if false in RGCN example * Fix typo in comment * Add asserts * Improve documentation in example Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 20 May, 2021 1 commit
-
-
nv-dlasalle authored
[Feature][Performance] Implement NCCL wrapper for communicating NodeEmbeddings and sparse gradients. (#2825) * Split NCCL wrapper from sparse optimizer and sparse embedding * Add more unit tests for single node nccl * Fix unit test for tf * Switch to device histogram * Fix histgram issues * Finish migration to histogram * Handle cases with zero send/recieve data * Start on partition object * Get compiling * Updates * Add unit tests * Switch to partition object * Fix linting issues * Rename partition file * Add python doc * Fix python assert and finish doxygen comments * Remove stubs for range based partition to satisfy pylint * Wrap unit test in GPU only * Wrap explicit cuda call in ifdef * Merge with partition.py * update docstrings * Cleanup partition_op * Add Workspace object * Switch to using workspace object * Move last remainder based function out of nccl_api * Add error messages * Update docs with examples * Fix linting erros Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 22 Mar, 2021 1 commit
-
-
nv-dlasalle authored
[Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 (#2758) * Wrap cub with prefixes and remove thrust * Using counting iterator Co-authored-by:
Zihao Ye <expye@outlook.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 09 Mar, 2021 1 commit
-
-
Tianqi Zhang (张天启) authored
* finish graph matching gpu version * use C++ shuffle * finish graph matching * fix bug * fix bug * change name and use swap * upt * fix format problem * fix format problem * stronger test * upt * upt * change python api * upt * upt * format check * upt * upt * fix bug Co-authored-by:Tong He <hetong007@gmail.com>
-
- 08 Feb, 2021 1 commit
-
-
nv-dlasalle authored
* Add start of to_block gpu implementation * Pull in more changes from 0.4.2 cuda_to_block * Move more code to IdArray * Refactor DeviceNodeMapMaker * Updates * get compiling * Integrate to_block * Fix ID allocation * Minor fixes * Cleanup cuda calls to use cuda_common * Reduce kernel calls * Lint cleanup * Expand documentation * Remove unused function * Rename variables for consistency * Add doxygen comments * Fix file extension * Remove raw asynccopy for deviceapi * Remove unused function * Fix block/tile configuration * Add cuda_device_common.cuh * Add basic hashtable * Migrate part of hashtable * Refactor to use external hashtable * Make functions members * Format hash table functions * Migrate duplicate filling * Move last function over * Refactor with cu file * lint c++ code * Move context check to C++ code * Use macro switch * Add missing files * Update docstring * update docs * Move atomic functions * Refactor hashtable * Fix linting * Expand docs * Fix mismatched argument names * Switch doxygen comments from using @param to \param Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 29 Jan, 2021 1 commit
-
-
Quan (Andy) Gan authored
-
- 28 Jan, 2021 1 commit
-
-
Zihao Ye authored
* add tvm as submodule * compilation is ok but calling fails * can call now * pack multiple modules, change names * upd * upd * upd * fix cmake * upd * upd * upd * upd * fix * relative path * upd * upd * upd * singleton * upd * trigger * fix * upd * count reducible * upd * upd * upd * upd * upd * upd * upd * upd * upd * only keep related files * upd * upd * upd * upd * lint * lint * lint * lint * pylint * upd * upd * compilation * fix * upd * upd * upd * upd * upd * upd * upd doc * refactor * fix * upd number Co-authored-by:
Zhi Lin <linzhilynn@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-42-78.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-21-156.us-east-2.compute.internal> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 02 Nov, 2020 1 commit
-
-
nv-dlasalle authored
* Update docs * Make non-default streams non-blocking
-
- 10 Sep, 2020 1 commit
-
-
Zihao Ye authored
* upd * upd * upd * upd * lint * upd * upd * fmt Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
- 06 Jun, 2019 1 commit
-
-
Lingfan Yu authored
* [Kernel] Minigun integration and fused kernel support (#519) * kernel interface * add minigun * Add cuda build * functors * working on binary elewise * binary reduce * change kernel interface * WIP * wip * fix minigun * compile * binary reduce kernels * compile * simple test passed * more reducers * fix thrust problem * fix cmake * fix cmake; add proper guard for atomic * WIP: bcast * WIP * bcast kernels * update to new minigun pass-by-value practice * broadcasting dim * add copy src and copy edge * fix linking * fix none array problem * fix copy edge * add device_type and device_id to backend operator * cache csr adj, remove cache for adjmat and incmat * custom ops in backend and pytorch impl * change dgl-mg kernel python interface * add id_mapping var * clean up plus v2e spmv schedule * spmv schedule & clean up fall back * symbolic message and reduce func, remove bundle func * new executors * new backend interface for dgl kernels and pytorch impl * minor fix * fix * fix docstring, comments, func names * nodeflow * fix message id mapping and bugs... * pytorch test case & fix * backward binary reduce * fix bug * WIP: cusparse * change to int32 csr for cusparse workaround * disable cusparse * change back to int64 * broadcasting backward * cusparse; WIP: add rev_csr * unit test for kernels * pytorch backward with dgl kernel * edge softmax * fix backward * improve softmax * cache edge on device * cache mappings on device * fix partial forward code * cusparse done * copy_src_sum with cusparse * rm id getter * reduce grad for broadcast * copy edge reduce backward * kernel unit test for broadcasting * full kernel unit test * add cpu kernels * edge softmax unit test * missing ref * fix compile and small bugs * fix bug in bcast * Add backward both * fix torch utests * expose infershape * create out tensor in python * fix c++ lint * [Kernel] Add GPU utest and kernel utest (#524) * fix gpu utest * cuda utest runnable * temp disable test nodeflow; unified test for kernel * cuda test kernel done * [Kernel] Update kernel branch (#550) * [Model] add multiprocessing training with sampling. (#484) * reorganize sampling code. * add multi-process training. * speed up gcn_cv * fix graphsage_cv. * add new API in graph store. * update barrier impl. * support both local and distributed training. * fix multiprocess train. * fix. * fix barrier. * add script for loading data. * multiprocessing sampling. * accel training. * replace pull with spmv for speedup. * nodeflow copy from parent with context. * enable GPU. * fix a bug in graph store. * enable multi-GPU training. * fix lint. * add comments. * rename to run_store_server.py * fix gcn_cv. * fix a minor bug in sampler. * handle error better in graph store. * improve graphsage_cv for distributed mode. * update README. * fix. * update. * [Tutorial] add sampling tutorial. (#522) * add sampling tutorial. * add readme * update author list. * fix indent in the code. * rename the file. * update tutorial. * fix the last API. * update image. * [BUGFIX] fix the problems in the sampling tutorial. (#523) * add index. * update. * update tutorial. * fix gpu utest * cuda utest runnable * temp disable test nodeflow; unified test for kernel * cuda test kernel done * Fixing typo in JTNN after interface change (#536) * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515) * [Bug Fix] Fix inplace op at backend (#546) * Fix inplace operation * fix line seprator * [Feature] Add batch and unbatch for immutable graph (#539) * Add batch and unbatch for immutable graph * fix line seprator * fix lintr * remove unnecessary include * fix code review * [BUGFix] Improve multi-processing training (#526) * fix. * add comment. * remove. * temp fix. * initialize for shared memory. * fix graphsage. * fix gcn. * add more unit tests. * add more tests. * avoid creating shared-memory exclusively. * redefine remote initializer. * improve initializer. * fix unit test. * fix lint. * fix lint. * initialize data in the graph store server properly. * fix test. * fix test. * fix test. * small fix. * add comments. * cleanup server. * test graph store with a random port. * print. * print to stderr. * test1 * test2 * remove comment. * adjust the initializer signature. * [API] update graph store API. (#549) * add init_ndata and init_edata in DGLGraph. * adjust SharedMemoryGraph API. * print warning. * fix comment. * update example * fix. * fix examples. * add unit tests. * add comments. * [Refactor] Immutable graph index (#543) * WIP * header * WIP .cc * WIP * transpose * wip * immutable graph .h and .cc * WIP: nodeflow.cc * compile * remove all tmp dl managed ctx; they caused refcount issue * one simple test * WIP: testing * test_graph * fix graph index * fix bug in sampler; pass pytorch utest * WIP on mxnet * fix lint * fix mxnet unittest w/ unfortunate workaround * fix msvc * fix lint * SliceRows and test_nodeflow * resolve reviews * resolve reviews * try fix win ci * try fix win ci * poke win ci again * poke * lazy multigraph flag; stackoverflow error * revert node subgraph test * lazy object * try fix win build * try fix win build * poke ci * fix build script * fix compile * add a todo * fix reviews * fix compile * [Kernel] Update kernel branch (#576) * [Model] add multiprocessing training with sampling. (#484) * reorganize sampling code. * add multi-process training. * speed up gcn_cv * fix graphsage_cv. * add new API in graph store. * update barrier impl. * support both local and distributed training. * fix multiprocess train. * fix. * fix barrier. * add script for loading data. * multiprocessing sampling. * accel training. * replace pull with spmv for speedup. * nodeflow copy from parent with context. * enable GPU. * fix a bug in graph store. * enable multi-GPU training. * fix lint. * add comments. * rename to run_store_server.py * fix gcn_cv. * fix a minor bug in sampler. * handle error better in graph store. * improve graphsage_cv for distributed mode. * update README. * fix. * update. * [Tutorial] add sampling tutorial. (#522) * add sampling tutorial. * add readme * update author list. * fix indent in the code. * rename the file. * update tutorial. * fix the last API. * update image. * [BUGFIX] fix the problems in the sampling tutorial. (#523) * add index. * update. * update tutorial. * fix gpu utest * cuda utest runnable * temp disable test nodeflow; unified test for kernel * cuda test kernel done * Fixing typo in JTNN after interface change (#536) * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515) * [Bug Fix] Fix inplace op at backend (#546) * Fix inplace operation * fix line seprator * [Feature] Add batch and unbatch for immutable graph (#539) * Add batch and unbatch for immutable graph * fix line seprator * fix lintr * remove unnecessary include * fix code review * [BUGFix] Improve multi-processing training (#526) * fix. * add comment. * remove. * temp fix. * initialize for shared memory. * fix graphsage. * fix gcn. * add more unit tests. * add more tests. * avoid creating shared-memory exclusively. * redefine remote initializer. * improve initializer. * fix unit test. * fix lint. * fix lint. * initialize data in the graph store server properly. * fix test. * fix test. * fix test. * small fix. * add comments. * cleanup server. * test graph store with a random port. * print. * print to stderr. * test1 * test2 * remove comment. * adjust the initializer signature. * [API] update graph store API. (#549) * add init_ndata and init_edata in DGLGraph. * adjust SharedMemoryGraph API. * print warning. * fix comment. * update example * fix. * fix examples. * add unit tests. * add comments. * [Refactor] Immutable graph index (#543) * WIP * header * WIP .cc * WIP * transpose * wip * immutable graph .h and .cc * WIP: nodeflow.cc * compile * remove all tmp dl managed ctx; they caused refcount issue * one simple test * WIP: testing * test_graph * fix graph index * fix bug in sampler; pass pytorch utest * WIP on mxnet * fix lint * fix mxnet unittest w/ unfortunate workaround * fix msvc * fix lint * SliceRows and test_nodeflow * resolve reviews * resolve reviews * try fix win ci * try fix win ci * poke win ci again * poke * lazy multigraph flag; stackoverflow error * revert node subgraph test * lazy object * try fix win build * try fix win build * poke ci * fix build script * fix compile * add a todo * fix reviews * fix compile * all demo use python-3 (#555) * [DEMO] Reproduce numbers of distributed training in AMLC giant graph paper (#556) * update * update * update * update num_hops * fix bug * update * report numbers of distributed training in AMLC giant graph paper * [DEMO] Remove duplicate code for sampling (#557) * update * update * re-use single-machine code * update * use relative path * update * update * update * add __init__.py * add __init__.py * import sys, os * fix typo * update * [Perf] Improve performance of graph store. (#554) * fix. * use inplace. * move to shared memory graph store. * fix. * add more unit tests. * fix. * fix test. * fix test. * disable test. * fix. * [BUGIFX] fix a bug in edge_ids (#560) * add test. * fix compute. * fix test. * turn on test. * fix a bug. * add test. * fix. * disable test. * [DEMO] Add Pytorch demo for distributed sampler (#562) * update * update * update * add sender * update * remove duplicate cpde * [Test] Add gtest to project (#547) * add gtest module * add gtest * fix * Update CMakeLists.txt * Update README.md * [Perf] lazily create msg_index. (#563) * lazily create msg_index. * update test. * [BUGFIX] fix bugs for running GCN on giant graphs. (#561) * load mxnet csr. * enable load large csr. * fix * fix. * fix int overflow. * fix test. * [BugFix] Fix error when bfs_level = 0 in Entity Classification with RGCN (#559) * [DEMO] Update demo of distributed sampler (#564) * update * update * update demo * add network cpp test (#565) * Add unittest for C++ RPC (#566) * [CI] Fix CI for cpp test (#570) * fix CI for cpp test * update port number * [Docker] update docker image (#575) * update docker image * specify lint version * rm torch import from unified tests * [Kernel][Scheduler][MXNet] Scheduler for DGL kernels and MXNet backend support (#541) * [Model] add multiprocessing training with sampling. (#484) * reorganize sampling code. * add multi-process training. * speed up gcn_cv * fix graphsage_cv. * add new API in graph store. * update barrier impl. * support both local and distributed training. * fix multiprocess train. * fix. * fix barrier. * add script for loading data. * multiprocessing sampling. * accel training. * replace pull with spmv for speedup. * nodeflow copy from parent with context. * enable GPU. * fix a bug in graph store. * enable multi-GPU training. * fix lint. * add comments. * rename to run_store_server.py * fix gcn_cv. * fix a minor bug in sampler. * handle error better in graph store. * improve graphsage_cv for distributed mode. * update README. * fix. * update. * [Tutorial] add sampling tutorial. (#522) * add sampling tutorial. * add readme * update author list. * fix indent in the code. * rename the file. * update tutorial. * fix the last API. * update image. * [BUGFIX] fix the problems in the sampling tutorial. (#523) * add index. * update. * update tutorial. * fix gpu utest * cuda utest runnable * temp disable test nodeflow; unified test for kernel * cuda test kernel done * edge softmax module * WIP * Fixing typo in JTNN after interface change (#536) * mxnet backend support * improve reduce grad * add max to unittest backend * fix kernel unittest * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515) * lint * lint * win build * [Bug Fix] Fix inplace op at backend (#546) * Fix inplace operation * fix line seprator * [Feature] Add batch and unbatch for immutable graph (#539) * Add batch and unbatch for immutable graph * fix line seprator * fix lintr * remove unnecessary include * fix code review * [BUGFix] Improve multi-processing training (#526) * fix. * add comment. * remove. * temp fix. * initialize for shared memory. * fix graphsage. * fix gcn. * add more unit tests. * add more tests. * avoid creating shared-memory exclusively. * redefine remote initializer. * improve initializer. * fix unit test. * fix lint. * fix lint. * initialize data in the graph store server properly. * fix test. * fix test. * fix test. * small fix. * add comments. * cleanup server. * test graph store with a random port. * print. * print to stderr. * test1 * test2 * remove comment. * adjust the initializer signature. * try * fix * fix * fix * fix * fix * try * test * test * test * try * try * try * test * fix * try gen_target * fix gen_target * fix msvc var_args expand issue * fix * [API] update graph store API. (#549) * add init_ndata and init_edata in DGLGraph. * adjust SharedMemoryGraph API. * print warning. * fix comment. * update example * fix. * fix examples. * add unit tests. * add comments. * [Refactor] Immutable graph index (#543) * WIP * header * WIP .cc * WIP * transpose * wip * immutable graph .h and .cc * WIP: nodeflow.cc * compile * remove all tmp dl managed ctx; they caused refcount issue * one simple test * WIP: testing * test_graph * fix graph index * fix bug in sampler; pass pytorch utest * WIP on mxnet * fix lint * fix mxnet unittest w/ unfortunate workaround * fix msvc * fix lint * SliceRows and test_nodeflow * resolve reviews * resolve reviews * try fix win ci * try fix win ci * poke win ci again * poke * lazy multigraph flag; stackoverflow error * revert node subgraph test * lazy object * try fix win build * try fix win build * poke ci * fix build script * fix compile * add a todo * fix reviews * fix compile * WIP * WIP * all demo use python-3 (#555) * ToImmutable and CopyTo * [DEMO] Reproduce numbers of distributed training in AMLC giant graph paper (#556) * update * update * update * update num_hops * fix bug * update * report numbers of distributed training in AMLC giant graph paper * [DEMO] Remove duplicate code for sampling (#557) * update * update * re-use single-machine code * update * use relative path * update * update * update * add __init__.py * add __init__.py * import sys, os * fix typo * update * [Perf] Improve performance of graph store. (#554) * fix. * use inplace. * move to shared memory graph store. * fix. * add more unit tests. * fix. * fix test. * fix test. * disable test. * fix. * [BUGIFX] fix a bug in edge_ids (#560) * add test. * fix compute. * fix test. * turn on test. * fix a bug. * add test. * fix. * disable test. * DGLRetValue DGLContext conversion * [DEMO] Add Pytorch demo for distributed sampler (#562) * update * update * update * add sender * update * remove duplicate cpde * [Test] Add gtest to project (#547) * add gtest module * add gtest * fix * Update CMakeLists.txt * Update README.md * Add support to convert immutable graph to 32 bits * [Perf] lazily create msg_index. (#563) * lazily create msg_index. * update test. * fix binary reduce following new minigun template * enable both int64 and int32 kernels * [BUGFIX] fix bugs for running GCN on giant graphs. (#561) * load mxnet csr. * enable load large csr. * fix * fix. * fix int overflow. * fix test. * new kernel interface done for CPU * docstring * rename & docstring * copy reduce and backward * [BugFix] Fix error when bfs_level = 0 in Entity Classification with RGCN (#559) * [DEMO] Update demo of distributed sampler (#564) * update * update * update demo * adapt cuda kernels to the new interface * add network cpp test (#565) * fix bug * Add unittest for C++ RPC (#566) * [CI] Fix CI for cpp test (#570) * fix CI for cpp test * update port number * [Docker] update docker image (#575) * update docker image * specify lint version * rm torch import from unified tests * remove pytorch-specific test_function * fix unittest * fix * fix unittest backend bug in converting tensor to numpy array * fix * mxnet version * [BUGFIX] fix for MXNet 1.5. (#552) * remove clone. * turn on numpy compatible. * Revert "remove clone." This reverts commit 17bbf76ed72ff178df6b3f35addc428048672457. * revert format changes * fix mxnet api name * revert mistakes in previous revert * roll back CI to 20190523 build * fix unittest * disable test_shared_mem_store.py for now * remove mxnet/test_specialization.py * sync win64 test script * fix lowercase * missing backend in gpu unit test * transpose to get forward graph * pass update all * add sanity check * passing test_specialization.py * fix and pass test_function * fix check * fix pytorch softmax * mxnet kernels * c++ lint * pylint * try * win build * fix * win * ci enable gpu build * init submodule recursively * backend docstring * try * test win dev * doc string * disable pytorch test_nn * try to fix windows issue * bug fixed, revert changes * [Test] fix CI. (#586) * disable unit test in mxnet tutorial. * retry socket connection. * roll back to set_np_compat * try to fix multi-processing test hangs when it fails. * fix test. * fix. * doc string * doc string and clean up * missing field in ctypes * fix node flow schedule and unit test * rename * pylint * copy from parent default context * fix unit test script * fix * demo bug in nodeflow gpu test * [Kernel][Bugfix] fix nodeflow bug (#604) * fix nodeflow bug * remove debug code * add build gtest option * fix cmake; fix graph index bug in spmv.py * remove clone * fix div rhs grad bug * [Kernel] Support full builtin method, edge softmax and unit tests (#605) * add full builtin support * unit test * unit test backend * edge softmax * apply edge with builtin * fix kernel unit test * disable mxnet test_shared_mem_store * gen builtin reduce * enable mxnet gpu unittest * revert some changes * docstring * add note for the hack * [Kernel][Unittest][CI] Fix MXNet GPU CI (#607) * update docker image for MXNet GPU CI * force all dgl graph input and output on CPU * fix gpu unittest * speedup compilation * add some comments * lint * add more comments * fix as requested * add some comments * comment * lint * lint * update pylint * fix as requested * lint * lint * lint * docstrings of python DGL kernel entries * disable lint warnings on arguments in kernel.py * fix docstring in scheduler * fix some bug in unittest; try again * Revert "Merge branch 'kernel' of github.com:zzhang-cn/dgl into kernel" This reverts commit 1d2299e68b004182ea6130b088de1f1122b18a49, reversing changes made to ddc97fbf1bec2b7815c0da7c74f7ecb2f428889b. * Revert "fix some bug in unittest; try again" This reverts commit ddc97fbf1bec2b7815c0da7c74f7ecb2f428889b. * more comprehensive kernel test * remove shape check in test_specialization
-