- 14 Jan, 2021 1 commit
-
-
Quan (Andy) Gan authored
* fix munmap() using wrong parameter * rename variables Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 26 Dec, 2020 2 commits
-
-
Da Zheng authored
* delete shared memory when receive signal. * rename. * fix lint. * fix lint. * fix compile. * Fix. * we need to report error if the shared memory exist. * disable tensorflow test for shared memory. * revert. Co-authored-by:
Ubuntu <ubuntu@ip-172-31-2-202.us-west-1.compute.internal> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
Quan (Andy) Gan authored
-
- 25 Dec, 2020 1 commit
-
-
Quan (Andy) Gan authored
* first commit * some thoughts * move around * more commit * more fixes * now it uses torch allocator * fix symbol export error * fix * fixes * test fix * add script * building separate library per version * fix for vs2019 * more fixes * fix on windows build * update jenkinsfile * auto copy built dlls for windows * lint and installation guide update * fix * specify conda environment * set environment for ci * fix * fix * fix * fix again * revert * fix cmake * fix * switch to using python interpreter path * remove scripts * debug * oops sorry * Update index.rst * Update index.rst * copies automatically, no need for this * do not print message if library not found * tiny fixes * debug on nightly * replace add_compile_definitions to make CMake 3.5 happy * fix linking to wrong lib for multiple pytorch envs * changed building strategy * fix nightly * fix windows * fix windows again * setup bugfix * address comments * change README
-
- 02 Nov, 2020 1 commit
-
-
nv-dlasalle authored
* Update docs * Make non-default streams non-blocking
-
- 10 Sep, 2020 2 commits
-
-
Zihao Ye authored
* upd * upd * upd * upd * upd * upd * upd * upd * upd * upd * upd * fix * upd * upd * upd * upd * fix * upd Co-authored-by:VoVAllen <jz1749@nyu.edu>
-
Zihao Ye authored
* upd * upd * upd * upd * lint * upd * upd * fmt Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
- 30 Aug, 2020 1 commit
-
-
Zihao Ye authored
* udp * fix Co-authored-by:Quan Gan <coin2028@hotmail.com>
-
- 11 Aug, 2020 1 commit
-
-
chwan-rice authored
* fix OpenMP compatibility issues * typo Co-authored-by:Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 09 Aug, 2020 1 commit
-
-
Da Zheng authored
* temp fix omp. * set server threads. * add CAPI to set up OMP threads. * fix. * fix. * update namesapce. * set cpi properly. * allow to config num worker threads. * set #threads. * fix.
-
- 28 Jul, 2020 1 commit
-
-
Qidong Su authored
* update * update * update * update * fix * update * fix * update * update * win32 * update * fix * update * update * update * updat * update * update * fix * update * update * update * update * update * fix * TODO * 111 * fix * minor fix * minor fix * fox * Update shared_mem_manager.cc * update * update * update * update metis * update metis * update Co-authored-by:
VoVAllen <jz1749@nyu.edu> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 28 Jun, 2020 1 commit
-
-
Minjie Wang authored
* add cub; array cumsum * CSRSliceRows * fix warning * operator << for ndarray; CSRSliceRows * add CSRIsSorted * add csr_sort * inplace coosort and outplace csrsort * WIP: coo is sorted * mv cuda_utils * add AllTrue utility * csr sort * coo sort * coo2csr for sorted coo arrays * CSRToCOO from sorted * pass tests for the new kernel changes * cannot use inplace sort * lint * try fix msvc error * Fix g.copy_to and g.asnumbits; ToBlock no longer uses CSC * stash * revert some hack * revert some changes * address comments * fix * fix to_block unittest * add todo note
-
- 15 Jun, 2020 1 commit
-
-
Minjie Wang authored
* add cuda source * moving codes from kernel2 branch * operator overloading * Better error message for unsupported device * fix c tests * coo sort using cusparse * move test_rpc to distributed * lint * address comments and add utests Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by:
Chao Ma <mctt90@gmail.com> Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 26 May, 2020 1 commit
-
-
Jinjing Zhou authored
* WIP: rpc components * client & server * move network package to rpc * fix include * fix compile * c api * wip: test * add basic tests * missing file * [RPC] Zero copy serializer (#1517) * zerocopy serialization * add test for HeteroGraph * fix lint * remove unnecessary codes * add comment * lint * lint * disable pylint for now * add include for win * windows guard * lint * lint * skip test on windows * refactor * add comment * fix * comment * 1111 * fix * Update Jenkinsfile * [RPC] Implementation of RPC infra (#1544) * update * update * update * update * update * update * update * update * update * update * update * update * update * update * remove client.cc and server.cc * fix lint * update * update * fix linr * update * fix lint * update * update * update * update * update * update * update test * update * update test * update * update * update * update * update * update * update * update * update * update * update * update comment * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * fix lint * fix lint * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * Refactor StreamWithBuffer (#1550) * refactor * fix with new interface * remove copy * fix * remove comment Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Chao Ma <mctt90@gmail.com>
-
- 22 May, 2020 1 commit
-
-
Chao Ma authored
* WIP: rpc components * client & server * move network package to rpc * fix include * fix compile * c api * wip: test * add basic tests * missing file * [RPC] Zero copy serializer (#1517) * zerocopy serialization * add test for HeteroGraph * fix lint * remove unnecessary codes * add comment * lint * lint * disable pylint for now * add include for win * windows guard * lint * lint * skip test on windows * refactor * add comment * fix * comment * 1111 * fix * Update Jenkinsfile * [RPC] Implementation of RPC infra (#1544) * update * update * update * update * update * update * update * update * update * update * update * update * update * update * remove client.cc and server.cc * fix lint * update * update * fix linr * update * fix lint * update * update * update * update * update * update * update test * update * update test * update * update * update * update * update * update * update * update * update * update * update * update comment * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * fix lint * fix lint * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Jinjing Zhou <VoVAllen@users.noreply.github.com>
-
- 03 May, 2020 1 commit
-
-
Da Zheng authored
* initial version from distributed training. This is copied from multiprocessing training. * modify for distributed training. * it's runnable now. * measure time in neighbor sampling. * simplify neighbor sampling. * fix a bug in distributed neighbor sampling. * allow single-machine training. * fix a bug. * fix a bug. * fix openmp. * make some improvement. * fix. * add prepare in the sampler. * prepare nodeflow async. * fix a bug. * get id. * simplify the code. * improve. * fix partition.py * fix the example. * add more features. * fix the example. * allow one partition * use distributed kvstore. * do g2l map manually. * fix commandline. * a temp script to save reddit. * fix pull_handler. * add pytorch version. * estimate the time for copying data. * delete unused code. * fix a bug. * print id. * fix a bug * fix a bug * fix a bug. * remove redundent code. * revert modify in sampler. * fix temp script. * remove pytorch version. * fix. * distributed training with pytorch. * add distributed graph store. * fix. * add metis_partition_assignment. * fix a few bugs in distributed graph store. * fix test. * fix bugs in distributed graph store. * fix tests. * remove code of defining DistGraphStore. * fix partition. * fix example. * update run.sh. * only read necessary node data. * batching data fetch of multiple NodeFlows. * simplify gcn. * remove unnecessary code. * use the new copy_from_kvstore. * update training script. * print time in graphsage. * make distributed training runnable. * use val_nid. * fix train_sampling. * add distributed training. * add run.sh * add more timing. * fix a bug. * save graph metadata when partition. * create ndata and edata in distributed graph store. * add timing in minibatch training of GraphSage. * use pytorch distributed. * add checks. * fix a bug in global vs. local ids. * remove fast pull * fix a compile error. * update and add new APIs. * implement more methods in DistGraphStore. * update more APIs. * rename it to DistGraph. * rename to DistTensor * remove some unnecessary API. * remove unnecessary files. * revert changes in sampler. * Revert "simplify gcn." This reverts commit 0ed3a34ca714203a5b45240af71555d4227ce452. * Revert "simplify neighbor sampling." This reverts commit 551c72d20f05a029360ba97f312c7a7a578aacec. * Revert "measure time in neighbor sampling." This reverts commit 63ae80c7b402bb626e24acbbc8fdfe9fffd0bc64. * Revert "add timing in minibatch training of GraphSage." This reverts commit e59dc8957a414c7df5c316f51d78bce822bdef5e. * Revert "fix train_sampling." This reverts commit ea6aea9a4aabb8ba0ff63070aa51e7ca81536ad9. * fix lint. * add comments and small update. * add more comments. * add more unit tests and fix bugs. * check the existence of shared-mem graph index. * use new partitioned graph storage. * fix bugs. * print error in fast pull. * fix lint * fix a compile error. * save absolute path after partitioning. * small fixes in the example * Revert "[kvstore] support any data type for init_data() (#1465)" This reverts commit 87b6997b . * fix a bug. * disable evaluation. * Revert "Revert "[kvstore] support any data type for init_data() (#1465)"" This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee. * support set and init data. * support set and init data. * Revert "Revert "[kvstore] support any data type for init_data() (#1465)"" This reverts commit f5b8039c6326eb73bad8287db3d30d93175e5bee. * fix bugs. * fix unit test. * move to dgl.distributed. * fix lint. * fix lint. * remove local_nids. * fix lint. * fix test. * remove train_dist. * revert train_sampling. * rename funcs. * address comments. * address comments. Use NodeDataView/EdgeDataView to keep track of data. * address comments. * address comments. * revert. * save data with DGL serializer. * use the right way of getting shape. * fix lint. * address comments. * address comments. * fix an error in mxnet. * address comments. * add edge_map. * add more test and fix bugs. Co-authored-by:
Zheng <dzzhen@186590dc80ff.ant.amazon.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-6-131.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-26-167.us-east-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-16-150.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-16-250.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-30-135.us-west-2.compute.internal>
-
- 30 Mar, 2020 1 commit
-
-
Jinjing Zhou authored
* TF backend fix and new logic to choose backend * fix * fix * fix * fix * fix backend * fix * dlpack alignment * add flag * flag * lint * lint * remove unused * several fixes Co-authored-by:Minjie Wang <wmjlyjemaine@gmail.com>
-
- 09 Mar, 2020 1 commit
-
-
Jinjing Zhou authored
* patch * fix+1 * add readme * fix error message
-
- 07 Mar, 2020 1 commit
-
-
Quan (Andy) Gan authored
* add num nodes in ctors * fix * lint * addresses comments * replace with constexpr * remove function with rvalue reference * address comments
-
- 05 Mar, 2020 1 commit
-
-
Minjie Wang authored
This commit fixes a bug where the lock guard (for concurrently accessing the same scope from different threads) had basically no effect, due to being bound to a temporary only. Co-authored-by:Minjie Wang <minjie.wang@nyu.edu>
-
- 02 Mar, 2020 1 commit
-
-
Minjie Wang authored
* improve performance of sample_neighbors * some more improve * test script * benchmarks * multi process * update more tests * WIP * adding two API for state saving * add create from state * upd test * missing file * wip: pickle/unpickle * more c apis * find the problem of empty data array * add null array; pickling speed is bad * still bad perf * still bad perf * wip * fix the pickle speed test; now everything looks good * minor fix * bugfix * some lint fix * address comments * more fix * fix lint * add utest for random.choice * add utest for dgl.rand_graph * fix cpp utests * try fix ci * fix bug in TF backend * upd choice docstring * address comments * upd * try fix compile * add comment
-
- 31 Jan, 2020 1 commit
-
-
Quan (Andy) Gan authored
* trying to refactor IndexSelect * partial implementation * add index select and assign for floats as well * move to random choice source * more updates * fixes * fixes * more fixes * adding python impl * fixes * unit test * lint * lint x2 * lint x3 * update metapath2vec * debugging performance * still debugging for performance * tuning * switching to succvec * redo * revert non-uniform sampler to use vector * still not fast * why does this crash with OpenMP??? * because there was a data race!!! * add documentations and remove assign op * lint * lint x2 * lol what have i done * lint x3 * fix and disable gpu testing * bugfix * generic random walk * reorg the random walk source code * Update randomwalks.h * Update randomwalks_cpu.cc * rename file * move internal function to anonymous ns * reorg & docstrings * constant restart probability * docstring fix * more commit * random walk with restart, tested * some fixes * switch to using NDArray for choice * massive fix & docstring * lint x? * lint x?? * fix * export symbols * skip gpu test * addresses comments * replaces another VecToIdArray * add randomwalks.h to include * replace void * with template
-
- 23 Dec, 2019 1 commit
-
-
VoVAllen authored
* tf * add builtin support * fiix * pytest * fix * fix * fix some bugs * fix selecting * fix todo * fix test * fix test fail in tf * fix * fix * fix gather row * fix gather row * log backend * fix gather row * fix gather row * fix for pytorch * fix * fix * fix * fix * fix * fix tests * fix * fix * fix * fix * fix * fix * fix convert * fix * fix * fix * fix inplace * add alignment setting * add debug option * Revert "add alignment setting" This reverts commit ec63fb3506ea84fff7d447a1fbdfd1d5d1fb6110. * tf ci * fix lint * fix lint * add tfdlpack * fix type * add env * fix backend * fix * fix tests * remove one_hot * remove comment * remove comment * fix * use pip to install all * fix test * fix base * fix * fix * add skip * upgrade cmake * change version * change ci * fix * fix * fix * fix * fix seg fault * fix * fix python version * fix * try fix * fix * fix * tf takes longer time in ci * change py version * fix * fix * fix oom * change kg env * change kg env * 啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊 * 我再也不搞各种乱七八糟环境了…… * use pytest * Chang image * try node * try * fix * try new ci * try new ci * try new ci * try new ci * ci * try * try * fix * hot fix * fix * fix cpp test * add comments
-
- 20 Dec, 2019 1 commit
-
-
VoVAllen authored
* tf * add builtin support * fiix * pytest * fix * fix * fix some bugs * fix selecting * fix todo * fix test * fix test fail in tf * fix * fix * fix gather row * fix gather row * log backend * fix gather row * fix gather row * fix for pytorch * fix * fix * fix * fix * fix * fix tests * fix * fix * fix * fix * fix * fix * fix convert * fix * fix * fix * fix inplace * add alignment setting * add debug option * Revert "add alignment setting" This reverts commit ec63fb3506ea84fff7d447a1fbdfd1d5d1fb6110. * tf ci * fix lint * fix lint * add tfdlpack * fix type * add env * fix backend * fix * fix tests * remove one_hot * remove comment * remove comment * fix * use pip to install all * fix test * fix base * fix * fix * add skip * upgrade cmake * change version * change ci * fix * fix * fix * fix * fix seg fault * fix * fix python version * fix * try fix * fix * fix * tf takes longer time in ci * change py version * fix * fix * fix oom * change kg env * change kg env * 啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊 * 我再也不搞各种乱七八糟环境了…… * use pytest * Chang image
-
- 17 Sep, 2019 1 commit
-
-
Minjie Wang authored
* WIP. remove graph arg in NodeBatch and EdgeBatch * refactor: use graph adapter for scheduler * WIP: recv * draft impl * stuck at bipartite * bipartite->unitgraph; support dsttype == srctype * pass test_query * pass test_query * pass test_view * test apply * pass udf message passing tests * pass quan's test using builtins * WIP: wildcard slicing * new construct methods * broken * good * add stack cross reducer * fix bug; fix mx * fix bug in csrmm2 when the CSR is not square * lint * removed FlattenedHeteroGraph class * WIP * prop nodes, prop edges, filter nodes/edges * add DGLGraph tests to heterograph. Fix several bugs * finish nx<->hetero graph conversion * create bipartite from nx * more spec on hetero/homo conversion * silly fixes * check node and edge types * repr * to api * adj APIs * inc * fix some lints and bugs * fix some lints * hetero/homo conversion * fix flatten test * more spec in hetero_from_homo and test * flatten using concat names * WIP: creators * rewrite hetero_from_homo in a more efficient way * remove useless variables * fix lint * subgraphs and typed subgraphs * lint & removed heterosubgraph class * lint x2 * disable heterograph mutation test * docstring update * add edge id for nx graph test * fix mx unittests * fix bug * try fix * fix unittest when cross_reducer is stack * fix ci * fix nx bipartite bug; docstring * fix scipy creation bug * lint * fix bug when converting heterograph from homograph * fix bug in hetero_from_homo about ntype order * trailing white * docstring fixes for add_foo and data views * docstring for relation slice * to_hetero and to_homo with feature support * lint * lint * DGLGraph compatibility * incidence matrix & docstring fixes * example string fixes * feature in hetero_from_relations * deduplication of edge types in to_hetero * fix lint * fix
-
- 01 Jul, 2019 1 commit
-
-
Minjie Wang authored
* WIP: import tvm runtime node system * WIP: object system * containers * tested basic container composition * tested custom object * fix setattr bug * tested object container return * fix lint * some comments about get/set state * fix lint * fix lint * update cython * fix cython * ffi doc * fix doc
-
- 06 Jun, 2019 1 commit
-
-
Lingfan Yu authored
* [Kernel] Minigun integration and fused kernel support (#519) * kernel interface * add minigun * Add cuda build * functors * working on binary elewise * binary reduce * change kernel interface * WIP * wip * fix minigun * compile * binary reduce kernels * compile * simple test passed * more reducers * fix thrust problem * fix cmake * fix cmake; add proper guard for atomic * WIP: bcast * WIP * bcast kernels * update to new minigun pass-by-value practice * broadcasting dim * add copy src and copy edge * fix linking * fix none array problem * fix copy edge * add device_type and device_id to backend operator * cache csr adj, remove cache for adjmat and incmat * custom ops in backend and pytorch impl * change dgl-mg kernel python interface * add id_mapping var * clean up plus v2e spmv schedule * spmv schedule & clean up fall back * symbolic message and reduce func, remove bundle func * new executors * new backend interface for dgl kernels and pytorch impl * minor fix * fix * fix docstring, comments, func names * nodeflow * fix message id mapping and bugs... * pytorch test case & fix * backward binary reduce * fix bug * WIP: cusparse * change to int32 csr for cusparse workaround * disable cusparse * change back to int64 * broadcasting backward * cusparse; WIP: add rev_csr * unit test for kernels * pytorch backward with dgl kernel * edge softmax * fix backward * improve softmax * cache edge on device * cache mappings on device * fix partial forward code * cusparse done * copy_src_sum with cusparse * rm id getter * reduce grad for broadcast * copy edge reduce backward * kernel unit test for broadcasting * full kernel unit test * add cpu kernels * edge softmax unit test * missing ref * fix compile and small bugs * fix bug in bcast * Add backward both * fix torch utests * expose infershape * create out tensor in python * fix c++ lint * [Kernel] Add GPU utest and kernel utest (#524) * fix gpu utest * cuda utest runnable * temp disable test nodeflow; unified test for kernel * cuda test kernel done * [Kernel] Update kernel branch (#550) * [Model] add multiprocessing training with sampling. (#484) * reorganize sampling code. * add multi-process training. * speed up gcn_cv * fix graphsage_cv. * add new API in graph store. * update barrier impl. * support both local and distributed training. * fix multiprocess train. * fix. * fix barrier. * add script for loading data. * multiprocessing sampling. * accel training. * replace pull with spmv for speedup. * nodeflow copy from parent with context. * enable GPU. * fix a bug in graph store. * enable multi-GPU training. * fix lint. * add comments. * rename to run_store_server.py * fix gcn_cv. * fix a minor bug in sampler. * handle error better in graph store. * improve graphsage_cv for distributed mode. * update README. * fix. * update. * [Tutorial] add sampling tutorial. (#522) * add sampling tutorial. * add readme * update author list. * fix indent in the code. * rename the file. * update tutorial. * fix the last API. * update image. * [BUGFIX] fix the problems in the sampling tutorial. (#523) * add index. * update. * update tutorial. * fix gpu utest * cuda utest runnable * temp disable test nodeflow; unified test for kernel * cuda test kernel done * Fixing typo in JTNN after interface change (#536) * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515) * [Bug Fix] Fix inplace op at backend (#546) * Fix inplace operation * fix line seprator * [Feature] Add batch and unbatch for immutable graph (#539) * Add batch and unbatch for immutable graph * fix line seprator * fix lintr * remove unnecessary include * fix code review * [BUGFix] Improve multi-processing training (#526) * fix. * add comment. * remove. * temp fix. * initialize for shared memory. * fix graphsage. * fix gcn. * add more unit tests. * add more tests. * avoid creating shared-memory exclusively. * redefine remote initializer. * improve initializer. * fix unit test. * fix lint. * fix lint. * initialize data in the graph store server properly. * fix test. * fix test. * fix test. * small fix. * add comments. * cleanup server. * test graph store with a random port. * print. * print to stderr. * test1 * test2 * remove comment. * adjust the initializer signature. * [API] update graph store API. (#549) * add init_ndata and init_edata in DGLGraph. * adjust SharedMemoryGraph API. * print warning. * fix comment. * update example * fix. * fix examples. * add unit tests. * add comments. * [Refactor] Immutable graph index (#543) * WIP * header * WIP .cc * WIP * transpose * wip * immutable graph .h and .cc * WIP: nodeflow.cc * compile * remove all tmp dl managed ctx; they caused refcount issue * one simple test * WIP: testing * test_graph * fix graph index * fix bug in sampler; pass pytorch utest * WIP on mxnet * fix lint * fix mxnet unittest w/ unfortunate workaround * fix msvc * fix lint * SliceRows and test_nodeflow * resolve reviews * resolve reviews * try fix win ci * try fix win ci * poke win ci again * poke * lazy multigraph flag; stackoverflow error * revert node subgraph test * lazy object * try fix win build * try fix win build * poke ci * fix build script * fix compile * add a todo * fix reviews * fix compile * [Kernel] Update kernel branch (#576) * [Model] add multiprocessing training with sampling. (#484) * reorganize sampling code. * add multi-process training. * speed up gcn_cv * fix graphsage_cv. * add new API in graph store. * update barrier impl. * support both local and distributed training. * fix multiprocess train. * fix. * fix barrier. * add script for loading data. * multiprocessing sampling. * accel training. * replace pull with spmv for speedup. * nodeflow copy from parent with context. * enable GPU. * fix a bug in graph store. * enable multi-GPU training. * fix lint. * add comments. * rename to run_store_server.py * fix gcn_cv. * fix a minor bug in sampler. * handle error better in graph store. * improve graphsage_cv for distributed mode. * update README. * fix. * update. * [Tutorial] add sampling tutorial. (#522) * add sampling tutorial. * add readme * update author list. * fix indent in the code. * rename the file. * update tutorial. * fix the last API. * update image. * [BUGFIX] fix the problems in the sampling tutorial. (#523) * add index. * update. * update tutorial. * fix gpu utest * cuda utest runnable * temp disable test nodeflow; unified test for kernel * cuda test kernel done * Fixing typo in JTNN after interface change (#536) * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515) * [Bug Fix] Fix inplace op at backend (#546) * Fix inplace operation * fix line seprator * [Feature] Add batch and unbatch for immutable graph (#539) * Add batch and unbatch for immutable graph * fix line seprator * fix lintr * remove unnecessary include * fix code review * [BUGFix] Improve multi-processing training (#526) * fix. * add comment. * remove. * temp fix. * initialize for shared memory. * fix graphsage. * fix gcn. * add more unit tests. * add more tests. * avoid creating shared-memory exclusively. * redefine remote initializer. * improve initializer. * fix unit test. * fix lint. * fix lint. * initialize data in the graph store server properly. * fix test. * fix test. * fix test. * small fix. * add comments. * cleanup server. * test graph store with a random port. * print. * print to stderr. * test1 * test2 * remove comment. * adjust the initializer signature. * [API] update graph store API. (#549) * add init_ndata and init_edata in DGLGraph. * adjust SharedMemoryGraph API. * print warning. * fix comment. * update example * fix. * fix examples. * add unit tests. * add comments. * [Refactor] Immutable graph index (#543) * WIP * header * WIP .cc * WIP * transpose * wip * immutable graph .h and .cc * WIP: nodeflow.cc * compile * remove all tmp dl managed ctx; they caused refcount issue * one simple test * WIP: testing * test_graph * fix graph index * fix bug in sampler; pass pytorch utest * WIP on mxnet * fix lint * fix mxnet unittest w/ unfortunate workaround * fix msvc * fix lint * SliceRows and test_nodeflow * resolve reviews * resolve reviews * try fix win ci * try fix win ci * poke win ci again * poke * lazy multigraph flag; stackoverflow error * revert node subgraph test * lazy object * try fix win build * try fix win build * poke ci * fix build script * fix compile * add a todo * fix reviews * fix compile * all demo use python-3 (#555) * [DEMO] Reproduce numbers of distributed training in AMLC giant graph paper (#556) * update * update * update * update num_hops * fix bug * update * report numbers of distributed training in AMLC giant graph paper * [DEMO] Remove duplicate code for sampling (#557) * update * update * re-use single-machine code * update * use relative path * update * update * update * add __init__.py * add __init__.py * import sys, os * fix typo * update * [Perf] Improve performance of graph store. (#554) * fix. * use inplace. * move to shared memory graph store. * fix. * add more unit tests. * fix. * fix test. * fix test. * disable test. * fix. * [BUGIFX] fix a bug in edge_ids (#560) * add test. * fix compute. * fix test. * turn on test. * fix a bug. * add test. * fix. * disable test. * [DEMO] Add Pytorch demo for distributed sampler (#562) * update * update * update * add sender * update * remove duplicate cpde * [Test] Add gtest to project (#547) * add gtest module * add gtest * fix * Update CMakeLists.txt * Update README.md * [Perf] lazily create msg_index. (#563) * lazily create msg_index. * update test. * [BUGFIX] fix bugs for running GCN on giant graphs. (#561) * load mxnet csr. * enable load large csr. * fix * fix. * fix int overflow. * fix test. * [BugFix] Fix error when bfs_level = 0 in Entity Classification with RGCN (#559) * [DEMO] Update demo of distributed sampler (#564) * update * update * update demo * add network cpp test (#565) * Add unittest for C++ RPC (#566) * [CI] Fix CI for cpp test (#570) * fix CI for cpp test * update port number * [Docker] update docker image (#575) * update docker image * specify lint version * rm torch import from unified tests * [Kernel][Scheduler][MXNet] Scheduler for DGL kernels and MXNet backend support (#541) * [Model] add multiprocessing training with sampling. (#484) * reorganize sampling code. * add multi-process training. * speed up gcn_cv * fix graphsage_cv. * add new API in graph store. * update barrier impl. * support both local and distributed training. * fix multiprocess train. * fix. * fix barrier. * add script for loading data. * multiprocessing sampling. * accel training. * replace pull with spmv for speedup. * nodeflow copy from parent with context. * enable GPU. * fix a bug in graph store. * enable multi-GPU training. * fix lint. * add comments. * rename to run_store_server.py * fix gcn_cv. * fix a minor bug in sampler. * handle error better in graph store. * improve graphsage_cv for distributed mode. * update README. * fix. * update. * [Tutorial] add sampling tutorial. (#522) * add sampling tutorial. * add readme * update author list. * fix indent in the code. * rename the file. * update tutorial. * fix the last API. * update image. * [BUGFIX] fix the problems in the sampling tutorial. (#523) * add index. * update. * update tutorial. * fix gpu utest * cuda utest runnable * temp disable test nodeflow; unified test for kernel * cuda test kernel done * edge softmax module * WIP * Fixing typo in JTNN after interface change (#536) * mxnet backend support * improve reduce grad * add max to unittest backend * fix kernel unittest * [BugFix] Fix getting src and dst id of ALL edges in NodeFlow.apply_block (#515) * lint * lint * win build * [Bug Fix] Fix inplace op at backend (#546) * Fix inplace operation * fix line seprator * [Feature] Add batch and unbatch for immutable graph (#539) * Add batch and unbatch for immutable graph * fix line seprator * fix lintr * remove unnecessary include * fix code review * [BUGFix] Improve multi-processing training (#526) * fix. * add comment. * remove. * temp fix. * initialize for shared memory. * fix graphsage. * fix gcn. * add more unit tests. * add more tests. * avoid creating shared-memory exclusively. * redefine remote initializer. * improve initializer. * fix unit test. * fix lint. * fix lint. * initialize data in the graph store server properly. * fix test. * fix test. * fix test. * small fix. * add comments. * cleanup server. * test graph store with a random port. * print. * print to stderr. * test1 * test2 * remove comment. * adjust the initializer signature. * try * fix * fix * fix * fix * fix * try * test * test * test * try * try * try * test * fix * try gen_target * fix gen_target * fix msvc var_args expand issue * fix * [API] update graph store API. (#549) * add init_ndata and init_edata in DGLGraph. * adjust SharedMemoryGraph API. * print warning. * fix comment. * update example * fix. * fix examples. * add unit tests. * add comments. * [Refactor] Immutable graph index (#543) * WIP * header * WIP .cc * WIP * transpose * wip * immutable graph .h and .cc * WIP: nodeflow.cc * compile * remove all tmp dl managed ctx; they caused refcount issue * one simple test * WIP: testing * test_graph * fix graph index * fix bug in sampler; pass pytorch utest * WIP on mxnet * fix lint * fix mxnet unittest w/ unfortunate workaround * fix msvc * fix lint * SliceRows and test_nodeflow * resolve reviews * resolve reviews * try fix win ci * try fix win ci * poke win ci again * poke * lazy multigraph flag; stackoverflow error * revert node subgraph test * lazy object * try fix win build * try fix win build * poke ci * fix build script * fix compile * add a todo * fix reviews * fix compile * WIP * WIP * all demo use python-3 (#555) * ToImmutable and CopyTo * [DEMO] Reproduce numbers of distributed training in AMLC giant graph paper (#556) * update * update * update * update num_hops * fix bug * update * report numbers of distributed training in AMLC giant graph paper * [DEMO] Remove duplicate code for sampling (#557) * update * update * re-use single-machine code * update * use relative path * update * update * update * add __init__.py * add __init__.py * import sys, os * fix typo * update * [Perf] Improve performance of graph store. (#554) * fix. * use inplace. * move to shared memory graph store. * fix. * add more unit tests. * fix. * fix test. * fix test. * disable test. * fix. * [BUGIFX] fix a bug in edge_ids (#560) * add test. * fix compute. * fix test. * turn on test. * fix a bug. * add test. * fix. * disable test. * DGLRetValue DGLContext conversion * [DEMO] Add Pytorch demo for distributed sampler (#562) * update * update * update * add sender * update * remove duplicate cpde * [Test] Add gtest to project (#547) * add gtest module * add gtest * fix * Update CMakeLists.txt * Update README.md * Add support to convert immutable graph to 32 bits * [Perf] lazily create msg_index. (#563) * lazily create msg_index. * update test. * fix binary reduce following new minigun template * enable both int64 and int32 kernels * [BUGFIX] fix bugs for running GCN on giant graphs. (#561) * load mxnet csr. * enable load large csr. * fix * fix. * fix int overflow. * fix test. * new kernel interface done for CPU * docstring * rename & docstring * copy reduce and backward * [BugFix] Fix error when bfs_level = 0 in Entity Classification with RGCN (#559) * [DEMO] Update demo of distributed sampler (#564) * update * update * update demo * adapt cuda kernels to the new interface * add network cpp test (#565) * fix bug * Add unittest for C++ RPC (#566) * [CI] Fix CI for cpp test (#570) * fix CI for cpp test * update port number * [Docker] update docker image (#575) * update docker image * specify lint version * rm torch import from unified tests * remove pytorch-specific test_function * fix unittest * fix * fix unittest backend bug in converting tensor to numpy array * fix * mxnet version * [BUGFIX] fix for MXNet 1.5. (#552) * remove clone. * turn on numpy compatible. * Revert "remove clone." This reverts commit 17bbf76ed72ff178df6b3f35addc428048672457. * revert format changes * fix mxnet api name * revert mistakes in previous revert * roll back CI to 20190523 build * fix unittest * disable test_shared_mem_store.py for now * remove mxnet/test_specialization.py * sync win64 test script * fix lowercase * missing backend in gpu unit test * transpose to get forward graph * pass update all * add sanity check * passing test_specialization.py * fix and pass test_function * fix check * fix pytorch softmax * mxnet kernels * c++ lint * pylint * try * win build * fix * win * ci enable gpu build * init submodule recursively * backend docstring * try * test win dev * doc string * disable pytorch test_nn * try to fix windows issue * bug fixed, revert changes * [Test] fix CI. (#586) * disable unit test in mxnet tutorial. * retry socket connection. * roll back to set_np_compat * try to fix multi-processing test hangs when it fails. * fix test. * fix. * doc string * doc string and clean up * missing field in ctypes * fix node flow schedule and unit test * rename * pylint * copy from parent default context * fix unit test script * fix * demo bug in nodeflow gpu test * [Kernel][Bugfix] fix nodeflow bug (#604) * fix nodeflow bug * remove debug code * add build gtest option * fix cmake; fix graph index bug in spmv.py * remove clone * fix div rhs grad bug * [Kernel] Support full builtin method, edge softmax and unit tests (#605) * add full builtin support * unit test * unit test backend * edge softmax * apply edge with builtin * fix kernel unit test * disable mxnet test_shared_mem_store * gen builtin reduce * enable mxnet gpu unittest * revert some changes * docstring * add note for the hack * [Kernel][Unittest][CI] Fix MXNet GPU CI (#607) * update docker image for MXNet GPU CI * force all dgl graph input and output on CPU * fix gpu unittest * speedup compilation * add some comments * lint * add more comments * fix as requested * add some comments * comment * lint * lint * update pylint * fix as requested * lint * lint * lint * docstrings of python DGL kernel entries * disable lint warnings on arguments in kernel.py * fix docstring in scheduler * fix some bug in unittest; try again * Revert "Merge branch 'kernel' of github.com:zzhang-cn/dgl into kernel" This reverts commit 1d2299e68b004182ea6130b088de1f1122b18a49, reversing changes made to ddc97fbf1bec2b7815c0da7c74f7ecb2f428889b. * Revert "fix some bug in unittest; try again" This reverts commit ddc97fbf1bec2b7815c0da7c74f7ecb2f428889b. * more comprehensive kernel test * remove shape check in test_specialization
-
- 21 May, 2019 1 commit
-
-
Minjie Wang authored
* WIP * header * WIP .cc * WIP * transpose * wip * immutable graph .h and .cc * WIP: nodeflow.cc * compile * remove all tmp dl managed ctx; they caused refcount issue * one simple test * WIP: testing * test_graph * fix graph index * fix bug in sampler; pass pytorch utest * WIP on mxnet * fix lint * fix mxnet unittest w/ unfortunate workaround * fix msvc * fix lint * SliceRows and test_nodeflow * resolve reviews * resolve reviews * try fix win ci * try fix win ci * poke win ci again * poke * lazy multigraph flag; stackoverflow error * revert node subgraph test * lazy object * try fix win build * try fix win build * poke ci * fix build script * fix compile * add a todo * fix reviews * fix compile
-
- 20 May, 2019 1 commit
-
-
Da Zheng authored
* fix. * add comment. * remove. * temp fix. * initialize for shared memory. * fix graphsage. * fix gcn. * add more unit tests. * add more tests. * avoid creating shared-memory exclusively. * redefine remote initializer. * improve initializer. * fix unit test. * fix lint. * fix lint. * initialize data in the graph store server properly. * fix test. * fix test. * fix test. * small fix. * add comments. * cleanup server. * test graph store with a random port. * print. * print to stderr. * test1 * test2 * remove comment. * adjust the initializer signature.
-
- 08 Apr, 2019 1 commit
-
-
Da Zheng authored
* accelerate gcn_ns. * add timing. * run infer with whole graph. * distributed gcn_ns. * reconstruct gcn_ns. * minor fix. * change graphsage_cv for numa. * fix #OMP threads. * accelerate graphsage_cv. * fix a weird bug. * add profiler in graphsage_cv. * accelerate graphsage_cv. manually aggregate neighbors' embeddings with pull. * load csr directly in gcn_ns_sc. * parallel sort for graph index. * Revert "parallel sort for graph index." This reverts commit 86fe2c7117fe5e56b0d481b39849c258b166945b. * run gcn_ns_sc on GPUs. * acc gcn_cv_sc. * change gcn_cv for numa. * fix gcn_cv to use numa and gpu. * improve graphsage_cv to use numa and gpu. * improve gcn_ns. * improve graphsage_cv. * init shared memory graph store. * fix. * enable init ndata. * improve tests. * add bidirectional communication. * link to rt. * fix compilation error. * fix shared memory init. * use MessageQueue for inter-process communication. * reconstruct immutable graph csr. * fix gcn. * load csr to shared memory. * fix minor bugs. * add comments. * refactor SharedMemory. * fix bugs in ImmutableGraph. * create CSR graph from shared memory. * add more test for loading a csr graph. * terminate graph store properly. * allow initializing ndata in the graph store server. * use RPC for inter-process communication. * a script for loading a graph. * allow customizing port. * list all ndata and edata. * support dtype. * reorganize SharedMemoryGraphStore. * fix ndata shape. * reconstruct gcn_ns. * print info. * set omp in gcn_ns. * reset sampling examples. * fix lint. * fix lint. * reset gcn. * disable shared memory in windows. * fix. * fix. * reset changes. * revert nodeflow changes. * fix cmake. * fix test. * fix test. * fix test. * fix test. * add comments. * fix test. * move vector out. * fix lint. * fix lint. * move SharedMemory. * update cmake. * update comment. * fix comments. * Revert "update cmake." This reverts commit 592445e37077f70a6e3f2e5245f9a3d086b04f3b. * update cmake. * add comments. * rename. * change the comment. * fix a bug. * rename. * add comments. * add comments. * add init_edata. * rewrite memory alloc. * move vector to CSR. * fix. * init data. * Revert "init data." This reverts commit 2b217b9553911b7dd84a9f1d9b68430b5aa18e23. * init data. * init new columns correctly.
-
- 05 Mar, 2019 1 commit
-
-
Minjie Wang authored
* enable cython * add helper function and data structure for void_p vector return * move sampler from graph index to contrib.sampling * WIP * WIP * refactor layer sampling * pass tests * fix lint * fix graphsage * remove comments * pickle test * fix comments * update dev guide for cython build
-
- 05 Dec, 2018 1 commit
-
-
Lingfan Yu authored
* include/dgl/runtime * include * src/runtime * src/graph * src/scheduler * src * clean up CMakeLists * further clean up in cmake * install commands * python/dgl/_ffi/_cython * python/dgl/_ffi/_ctypes * python/dgl/_ffi * python/dgl * some fix * copy right
-
- 19 Oct, 2018 1 commit
-
-
Minjie Wang authored
-
- 18 Oct, 2018 1 commit
-
-
Gan Quan authored
* multigraph support on graph index * more tests * multigraph flag, bugfix on clear & copy * networkx interfaces * including graph index tests in Jenkins * node subgraph test * edge subgraphs * removing duplicates in pred/succ * more explicit test and doc * query source and destination from edge id * subgraphindex * renaming has_edge to has_edge_between, apply_edges adding eid * send_on and send_and_recv_on * DGLGraph edge subgraph * merged send_on and send_and_recv_on * change request * removing hashmap * creating multigraph by flag; mingw support * changes per request * reverting networkx auto multigraph discovery * notes on send/send_and_recv on multigraphs * changing test reducer from sum to max * added a fixme note in spmv scheduler
-
- 09 Oct, 2018 1 commit
-
-
Minjie Wang authored
-
- 08 Sep, 2018 1 commit
-
-
Minjie Wang authored
-
- 05 Sep, 2018 1 commit
-
-
Minjie Wang authored
-