Commits · e9b624fe227d2e01d3aff057b4a49f0cae58da13 · OpenDAS / dgl

11 Aug, 2022 3 commits

Merge branch 'master' into dist_part · e9b624fe
Minjie Wang authored Aug 11, 2022

e9b624fe

Adding launch script and wrapper script to trigger distributed graph … (#4276) · 8086d1ed

kylasa authored Aug 11, 2022



* Adding launch script and wrapper script to trigger distributed graph partitioning pipeline as defined in the UX document

1. dispatch_data.py is a wrapper script which builds the command and triggers the distributed partitioning pipeline
2. distgraphlaunch.py is the main python script which triggers the pipeline and to simplify its usage dispatch_data.py is included as a wrapper script around it.

* Added code to auto-detect python version and retrieve some parameters from the input metadata json file

1. Auto detect python version
2. Read the metadata json file and extract some parameters to pass to the user defined command which is used to trigger the pipeline.

* Updated the json file name to metadata.json file per UX documentation

1. Renamed json file name per UX documentation.

* address comments

* fix

* fix doc

* use unbuffered logging to cure anxiety

* cure more anxiety

* Update tools/dispatch_data.py
Co-authored-by: Minjie Wang <minjie.wang@nyu.edu>

* oops
Co-authored-by: Quan Gan <coin2028@hotmail.com>
Co-authored-by: Minjie Wang <minjie.wang@nyu.edu>

8086d1ed

[Distributed] Graph chunking UX (#4365) · 067cd744

Quan (Andy) Gan authored Aug 11, 2022

* first commit

* update

* huh

* fix

* update

* revert core

* fix

* update

* rewrite

* oops

* address comments

* add graph name

* address comments

* remove sample metadata file

* address comments

* fix

* remove

* add docs

067cd744

10 Aug, 2022 4 commits

[Example]rgcn-ogbn-mag (#4331) · a88e7f7e

YJ-Zhao authored Aug 10, 2022



* rgcn-ogbn-mag

* Add link in README.md

* correct code-format,add the reset_parameters function to the HeteroEmbedding module

* add the annotation in hetero.py

* add a unit test

* modify format

* Update
Co-authored-by: Mufei Li <mufeili1996@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-50-143.us-west-2.compute.internal>

a88e7f7e

[Example][Bugfix] Fix infograph example (#4298) · 919b7838

Chang Liu authored Aug 10, 2022



* Fix infograph example

* Update

* Revert the changes and update Doc

* Update

* Split lines to pass CI-lint

* Update

* Update
Co-authored-by: Mufei Li <mufeili1996@gmail.com>

919b7838

Update issue templates · 14a77c86
Minjie Wang authored Aug 10, 2022

14a77c86
Update issue templates · 91f4eee0
Minjie Wang authored Aug 10, 2022

91f4eee0

09 Aug, 2022 2 commits
- [Bug] Fix broken static_assert (#4342) · 182e1ad5
  Xin Yao authored Aug 09, 2022
  
  182e1ad5
- [Bug] A bunch of fixes in edge_softmax_hetero (#4336) · 62c827c8
  Quan (Andy) Gan authored Aug 09, 2022
```
* bunch of fixes

* Update test_edge_softmax_hetero.py

* Update test_edge_softmax_hetero.py
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
```
  62c827c8
07 Aug, 2022 2 commits

[Distributed] reduce memory consumption in distributed graph partitioning. (#4338) · 60bc0b76

kylasa authored Aug 07, 2022

* Fix for node_subgraph function, which seems to generate segmentation fault for very large partitions

1. Removed three graph dgl objects and we create the final dgl object directly by maintaining the following constraints
a) nodes are reordered so that local nodes are placed in the beginning of the nodes list compared to non-local nodes.
b)Edges order are maintained as passed into this function.
c) src/dst end points are mapped to target values based on the reshuffle'd nodes order.

* Code changes addressing CI comments for this PR

1. Used Da's suggested map to map nodes from old to new order.
This is much simpler and mem. efficient.

* Addressing CI Comments.

1. Reduced the amount of documentation to reflect the actual implementation.
2. named the mapping object appropriately.

60bc0b76

[Bugfix] Fix the default value of `num_bases` in RelGraphConv module (#4321) · 5ba5106a
Chang Liu authored Aug 07, 2022
```
* Fix doc and default settings for RelGraphConv

* Add unit test

* Split msg in two lines to pass CI-lint
```
5ba5106a

06 Aug, 2022 1 commit

[Distributed] use alltoall fix to bypass gloo - alltoallv bug in distributed partitioning (#4311) · c1e01b1d

kylasa authored Aug 05, 2022

* Alltoall Fix to bypass gloo - alltoallv bug which is preventing further testing

1. Replaced alltoallv gloo wrapper call with alltoall message.
2. All the messages are padded to be of same length
3. Receiving side unpads the messages and continues processing.

* Code changes to address CI comments

1. Removed unused functions from gloo_wrapper.py
2. Changed the function signature of alltoallv_cpu_data as suggested.
3. Added docstring to include more description of the functionality inside alltoallv_cpu_data. Included more asserts to validate the assumptions.

* Changed the function name appropriately

Changed the function name from "alltoallv_cpu_data" to alltoallv_cpu which I believe is appropriate because underlying functionality is providing alltoallv which is basically alltoall_cpu + padding

* Added code and text to address the review comments.

1. Changed the function name to indicate the local use of this function.
2. Changed docstring to indicate the assumptions made by alltoallv_cpu function.

* Removed unused function from import statement

Removed unused/removed function from import statement.

c1e01b1d

03 Aug, 2022 1 commit
- [BugFix] fix etype check in DistGraph.edge_subgraph (#4322) · 43ba94ee
  Rhett Ying authored Aug 03, 2022
  
  43ba94ee
02 Aug, 2022 1 commit
- [Unittest] Improve test_dataloader (#4301) · 463650a7
  Xin Yao authored Aug 02, 2022
```
* test ddp dataloader

* add pure_gpu for edgedataloader

* resolve ddp issue
```
  463650a7
01 Aug, 2022 3 commits
- [BugFix] enable DistGraph.find_edge() works with str or tuple of str (#4319) · 4dd16f5d
  Rhett Ying authored Aug 01, 2022
  
  4dd16f5d
- [Feature] Enable UVA for Weighted Samplers (#4314) · 44b68641
  Xin Yao authored Aug 01, 2022
```
* enable use for weighted neighbor sampler and biased random walk

* add unit tests

* fix for mxnet/tf

* fix typo
```
  44b68641
- [Example][Refactor] Refactor GIN example (#4280) · 9a16a5e0
  Chang Liu authored Jul 31, 2022
```
* Refactor GIN example

* Update

* Update README

* Minor update

* README update
Co-authored-by: Mufei Li <mufeili1996@gmail.com>
```
  9a16a5e0
30 Jul, 2022 2 commits
- [BugFix] fix incorrect _bias and bias usage (#4310) · d6957c28
  Rhett Ying authored Jul 30, 2022
  
  d6957c28
- [CI] separate distributed tests from torch cpu tests (#4313) · b3242e90
  Rhett Ying authored Jul 30, 2022
```
* [CI] separate distributed tests from torch cpu tests

* remove TF related env
```
  b3242e90
29 Jul, 2022 1 commit

[Feature] Add CUDA Weighted Neighborhood Sampling (#4064) · 86c81b4e

Xin Yao authored Jul 29, 2022



* add weighted sampling without replacement (A-Chao)

* improve Algorithm A-Chao with block-wise prefix sum

* correctly fill out_idxs

* implement weighted sampling with replacement

* small fix

* merge host-side code of weighted/uniform sampling

* enable unit tests for cuda weighted sampling

* move thrust/cub wrapper to the cmake file

* update docs accordingly

* fix linting

* fix linting

* fix unit test

* Bump external CUB/Thrust versions

* Fix code style and update description of algorithm design

* [Feature] GPU support weighted graph neighbor sampling
commit by pengqirong(OPPO)

* merge pengqirong's implementation

* revert the change to cub and thrust

* fix linting

* use DeviceSegmentedSort for better performance

* add more comments

* add necessary notes

* add necessary notes

* resolve some comments

* define THRUST_CUB_WRAPPED_NAMESPACE

* fix doc
Co-authored-by: 彭齐荣 <657017034@qq.com>

86c81b4e

28 Jul, 2022 2 commits
- [DistTest] fix incorrect shell if statement (#4304) · 17f1432a
  Rhett Ying authored Jul 28, 2022
```
* [DistTest] fix incorrect shell if statement

* fix incorrect use of dist.initialize()
```
  17f1432a
- [Example] Fixed device type in GraphSAGE inference (#4306) · ee91863e
  Wey Gu authored Jul 28, 2022
```
* Fixed device type in inference

* change buffer_device instead
```
  ee91863e
27 Jul, 2022 3 commits

[Transform] Allow add data to self loop created by AddSelfLoop or add_self_loop (#4261) · 2cf05c53

Pengfei Xia authored Jul 27, 2022



* Update

* Update functional.py

* Update

* Update test_transform.py

* Update

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update

* Update

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update module.py

* Update test_transform.py

* Update test_transform.py
Co-authored-by: Mufei Li <mufeili1996@gmail.com>

2cf05c53

[Bugfix] Fixed fatal log output in COO class (#4286) · 92f87f48
Zhuobin Huang authored Jul 27, 2022
```
Co-authored-by: Xin Yao <xiny@nvidia.com>
```
92f87f48
[Log] fix confusing error log in TCPSocket::Bind() (#4299) · 069068aa
Rhett Ying authored Jul 27, 2022
```
* [Log] fix confusing error log in TCPSocket::Bind()

* fix lint
```
069068aa

26 Jul, 2022 2 commits

[Bugfix] Fix dataloader pytorch cuda indexing (#4297) · 4f797295

Chang Liu authored Jul 26, 2022



* Modify to repro crash

* Revert to orig. scenario and add fix

* Update
Co-authored-by: Xin Yao <xiny@nvidia.com>

4f797295

[Feature] Add CUDA Weighted Randomwalk Sampling (#4243) · 7e6a6b4a

Dewvin authored Jul 26, 2022



* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* [Feature] Add CUDA Weighted Randomwalk Sampling

* fix empty prob array && enable non-uniform for restart && enable unit tests

* update doc and guide for randomwalk and pinsage

* update comments
Co-authored-by: zhenliangqiu <ubuntu@ip-172-31-24-245.ap-southeast-1.compute.internal>
Co-authored-by: xiny <xiny@nvidia.com>

7e6a6b4a

25 Jul, 2022 2 commits
- [Dist][Optim] Change op order in SparseAdagrad to be numerically closer to PyTorch (#4253) · 7cd531c4
  Serge Panev authored Jul 24, 2022
```
Signed-off-by: Serge Panev <spanev@nvidia.com>
Co-authored-by: Mufei Li <mufeili1996@gmail.com>
```
  7cd531c4
- [Dist] remove deprecated arguments from initialize() (#4284) · 8292bf32
  Rhett Ying authored Jul 25, 2022
  
  8292bf32
23 Jul, 2022 1 commit

[Distributed] Change for the new input format for distributed partitioning (#4273) · 7f8e1cf2

kylasa authored Jul 23, 2022

* Code changes to address the updated file format support for massively large graphs.

1. Updated the docstring for the starting function 'gen_dist_partitions" to describe the newly proposed file format for input dataset.
2. Code which was dependent on the structure of the old-metadata json object has been updated to read from the newly proposed metadata file.
3. Fixed some errors when appropriate functions were invoked and the calling function expects return values from the invoked furnction.
4. This modified code has been tested on "mag" dataset using 4-way partitions and verified the results

* Code changes to address the CI review comments

1. Improved docstrings for some functions.
2. Added a new function in the utils.py to compute the id ranges and this is used in multiple places.

* Added TODO to indicate the redundant data structure.

Because of the new file format changes, one of the dictionaries (node_feature_tids, node_tids) will be redundant. Added TODO text so that this will be removed in the next iteration of code changes.

7f8e1cf2

22 Jul, 2022 1 commit

[Example][Refactor] Refactor GAT example (#4240) · 6e1be69a

Chang Liu authored Jul 22, 2022



* Refactor gat example

* Add ppi support

* Minor update

* Update

* Update

* Change valid_xxx to val_xxx

* Readme Update

* Update
Co-authored-by: Mufei Li <mufeili1996@gmail.com>

6e1be69a

21 Jul, 2022 2 commits

Update (#4277) · 05d9d496

Mufei Li authored Jul 21, 2022


Co-authored-by: Ubuntu <ubuntu@ip-172-31-53-142.us-west-2.compute.internal>

05d9d496

[Example] Progressive Sample Selection (PSS). (#4263) · 05aca98d

Tianyue Cao authored Jul 21, 2022



* upload PSS

* upload PSS

* upload PSS

* pss code reformat

* fix bug

* update README

* update train bash

* remove vit

* update README

* delete InfoPlotter

* delete Smooth_AP_loss.py

* update README

* update README
Co-authored-by: Tianjun Xiao <xiaotj1990327@gmail.com>

05aca98d

20 Jul, 2022 1 commit

[Dist][Test] Add tests for multi-node DistEmbedding (#4256) · 5fc1d0c8

Serge Panev authored Jul 20, 2022


Signed-off-by: Serge Panev <spanev@nvidia.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

5fc1d0c8

19 Jul, 2022 3 commits

[Example][Bugfix] graphsage node classification example (#4260) · 701b746b

Chang Liu authored Jul 19, 2022



* Fix node_classification.py

* Minor update
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>

701b746b

[Dist][Test] Improves DistTensor test for num_part id > 2 (#4265) · 4dc5728a

Serge Panev authored Jul 19, 2022


Signed-off-by: Serge Panev <spanev@nvidia.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

4dc5728a

[Doc]Fix doc typo in DistEmbedding (#4258) · 740cd706
Rhett Ying authored Jul 19, 2022
```
* [Doc] fix docstring typo

* Update sparse_emb.py

* Update sparse_emb.py

* update link
```
740cd706

15 Jul, 2022 1 commit
- decompose (#4259) · 9a7ad16e
  Quan (Andy) Gan authored Jul 15, 2022
  
  9a7ad16e
14 Jul, 2022 2 commits

[DGL-Go][Doc] Update DGL-Go version to 0.0.2 and misc fix from bug bash (#4236) · fdbf5a0f

Mufei Li authored Jul 14, 2022



* Update

* Update

* Update

* Update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-53-142.us-west-2.compute.internal>
Co-authored-by: Xin Yao <xiny@nvidia.com>

fdbf5a0f

[Unittest][Fix] Several unit tests fixes for Ampere+ and PyTorch 1.12+ (#4213) · 79b0a50a
Xin Yao authored Jul 14, 2022
```
* Fix test_csrmm for tensor core

* unset allow tf32 flag

* update test unified tensor

* skip fp16 for CPU
```
79b0a50a