Commits · 0d878ff8ef063e80d437c3a0535a92fd3d96d225 · OpenDAS / dgl

12 Apr, 2022 2 commits

[Example] Cleaned GraphSAGE node classification example with PyTorch Lightning (#3863) · 0d878ff8
Quan (Andy) Gan authored Apr 12, 2022
```
* cleaned pl node classification example

* conform to PL's method of updating the dataloader

* update

* lint

* fix test

* fix
```
0d878ff8

[Examples] Add pure gpu mode in the GraphSAGE node classification and link prediction (#3856) · f931c6ba

Serge Panev authored Apr 11, 2022


Signed-off-by: Serge Panev <spanev@nvidia.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

f931c6ba

09 Apr, 2022 1 commit

[Example] Add TorchMetrics in README (#3913) · 01e50626

Mufei Li authored Apr 09, 2022



* Update README.md

* Update README.md

* Update README.md
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

01e50626

05 Apr, 2022 1 commit

[Examples] Update graphsage multi-gpu example to use mutliple GPUs for... · 27a6eb56

nv-dlasalle authored Apr 05, 2022


[Examples] Update graphsage multi-gpu example to use mutliple GPUs for validation and testing. (#3827)

* Update graphsage multi-gpu example to use mutliple GPUs for validation and
testing.

* Remove argmax

* Fix rebase error

* Add more documentation to example and simplify

* Switch to name shared memory

* Add comment about how training is distributed

* Restore iteration count

* fix munmap error reporting for better error messages
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

27a6eb56

25 Mar, 2022 1 commit

[Bug] Fix multiple issues in distributed multi-GPU GraphSAGE example (#3870) · 7d416086

Quan (Andy) Gan authored Mar 25, 2022



* fix distributed multi-GPU example device

* try Join

* update version requirement in README

* use model.join

* fix docs
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

7d416086

11 Mar, 2022 1 commit

[Examples] Add pure gpu example of graphsage (#3796) · 57d2f31f

nv-dlasalle authored Mar 10, 2022



* Add pure_gpu example of graphsage

* move to advanced directory
Co-authored-by: Quan Gan <coin2028@hotmail.com>

57d2f31f

07 Mar, 2022 1 commit
- fix ddp dataloader in heterogeneous cases (#3801) · 44638b93
  Quan (Andy) Gan authored Mar 07, 2022
  
  44638b93
03 Mar, 2022 1 commit
- [Examples] fix path for load_graph (#3797) · 0528e90d
  Rhett Ying authored Mar 03, 2022
  
  0528e90d
01 Mar, 2022 1 commit
- [Examples] re-locate load_graph for share (#3784) · 0ec43924
  Rhett Ying authored Mar 01, 2022
  
  0ec43924
27 Feb, 2022 1 commit

[Doc and bugfix] Add docs and user guide and update tutorial for sampling pipeline (#3774) · d41d07d0

Quan (Andy) Gan authored Feb 28, 2022



* huuuuge update

* remove

* lint

* lint

* fix

* what happened to nccl

* update multi-gpu unsupervised graphsage example

* replace most of the dgl.mp.process with torch.mp.spawn

* update if condition for use_uva case

* update user guide

* address comments

* incorporating suggestions from @jermainewang

* oops

* fix tutorial to pass CI

* oops

* fix again
Co-authored-by: Xin Yao <xiny@nvidia.com>

d41d07d0

09 Feb, 2022 1 commit

[Feature] CUDA UVA sampling for MultiLayerNeighborSampler (#3674) · 738e8318

Xin Yao authored Feb 09, 2022



* implement pin_memory/unpin_memory/is_pinned for dgl.graph

* update python docstring

* update c++ docstring

* add test

* fix the broken UnifiedTensor

* XPU_SWITCH for kDLCPUPinned

* a rough version ready for testing

* eliminate extra context parameter for pin/unpin

* update train_sampling

* fix linting

* fix typo

* multi-gpu uva sampling case

* disable new format materialization for pinned graphs

* update python doc for pin_memory_

* fix unit test

* UVA sampling for link prediction

* dispatch most csr ops

* update graphsage example to combine uva sampling and UnifiedTensor

* update graphsage example to combine uva sampling and UnifiedTensor

* update graphsage example to combine uva sampling and UnifiedTensor

* update doc

* update examples

* change unitgraph and heterograph's PinMemory to in-place

* update examples for multi-gpu uva sampling

* update doc

* fix linting

* fix cpu build

* fix is_pinned for DistGraph

* fix is_pinned for DistGraph

* update graphsage unsupervised example

* update doc for gpu sampling

* update some check for sampling device switching

* fix linting

* adapt for new dataloader

* fix linting

* fix

* fix some name issue

* adjust device check

* add unit test for uva sampling & fix some zero_copy bug

* fix linting

* update num_threads in graphsage examples
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

738e8318

07 Feb, 2022 1 commit
- Fix dist example padding problem (#3687) · 0767c5fc
  Jinjing Zhou authored Feb 07, 2022
  
  0767c5fc
30 Jan, 2022 1 commit

[Sampling] New sampling pipeline plus asynchronous prefetching (#3665) · 701b4fcc

Quan (Andy) Gan authored Jan 30, 2022

* initial update

* more

* more

* multi-gpu example

* cluster gcn, finalize homogeneous

* more explanation

* fix

* bunch of fixes

* fix

* RGAT example and more fixes

* shadow-gnn sampler and some changes in unit test

* fix

* wth

* more fixes

* remove shadow+node/edge dataloader tests for possible ux changes

* lints

* add legacy dataloading import just in case

* fix

* update pylint for f-strings

* fix

* lint

* lint

* lint again

* cherry-picking commit fa9f494

* oops

* fix

* add sample_neighbors in dist_graph

* fix

* lint

* fix

* fix

* fix

* fix tutorial

* fix

* fix

* fix

* fix warning

* remove debug

* add get_foo_storage apis

* lint

701b4fcc

20 Jan, 2022 1 commit
- Update examples/pytorch/graphsage/experimental/README.md · 14ab462f
  Da Zheng authored Jan 20, 2022
```
Co-authored-by: Minjie Wang <minjie.wang@nyu.edu>
```
  14ab462f
19 Jan, 2022 1 commit
- fix. · 3b1978a3
  Da Zheng authored Jan 19, 2022
  
  3b1978a3
24 Dec, 2021 1 commit

Add 'nccl' backend in train_dist.py and fix pad_data function cuda bug (#3607) · 4889c578

xcwan authored Dec 24, 2021



* Add nccl backend  and fix pad_data function cuda bug

* Update train_dist.py

* Update train_dist.py
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

4889c578

20 Dec, 2021 1 commit
- Update train_dist.py (#3594) · 421c3622
  Jinjing Zhou authored Dec 20, 2021
```
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
```
  421c3622
15 Dec, 2021 1 commit

[DistGNN, Graph partitioning] Libra partition (#3376) · 78e0dae6

Vasimuddin Md authored Dec 15, 2021



* added distgnn plus libra codebase

* Dist application codes

* added comments in partition code. changed the interface of partitioning call.

* updated readme

* create libra partitioning branch for the PR

* removed disgnn files for first PR

* updated kernel.cc

* added libra_partition.cc and moved libra code from kernel.cc to libra_partition.cc

* fixed lint error; merged libra2dgl.py and main_Libra.py to libra_partition.py; added graphsage/distgnn folder and partition script.

* removed libra2dgl.py

* fixed the lint error and cleaned the code.

* revisions due to PR comments. added distgnn/tools contains partitions routines

* update 2 PR revision I

* fixed errors; also improved the runtime by 10x.

* fixed minor lint error

* fixed some more lints

* PR revision II changed the interface of libra partition function

* rewrite docstring
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

78e0dae6

06 Dec, 2021 1 commit
- Fix for distributed training (#3542) · 987db374
  Jinjing Zhou authored Dec 06, 2021
```
* tmp fix

* add description
```
  987db374
21 Oct, 2021 1 commit

[Sampling] Implement dgl.compact_graphs() for the GPU (#3423) · a8c81018

Xin Yao authored Oct 21, 2021

* gpu compact graph template

* cuda compact graph draft

* fix typo

* compact graphs

* pass unit test but fail in training

* example using EdgeDataLoader on the GPU

* refactor cuda_compact_graph and cuda_to_block

* update training scripts

* fix linting

* fix linting

* fix exclude_edges for the GPU

* add --data-cpu & fix copyright

a8c81018

23 Sep, 2021 1 commit
- Fix torch import in example (#3372) · 367a3a34
  Junwen Yao authored Sep 22, 2021
  
  367a3a34
20 Sep, 2021 1 commit
- Enable faster validation for pytorch graphsage example (#3361) · 01a22144
  nv-dlasalle authored Sep 19, 2021
  
  01a22144
02 Sep, 2021 1 commit
- Fix distributed device mapping problem. (#3313) · 21a40279
  xiang song(charlie.song) authored Sep 02, 2021
```
Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
```
  21a40279
16 Jul, 2021 1 commit

[Feature][Performance][GPU] Introducing UnifiedTensor for efficient zero-copy... · 905c0aa5

David Min authored Jul 17, 2021

[Feature][Performance][GPU] Introducing UnifiedTensor for efficient zero-copy host memory access from GPU (#3086)

* Add pytorch-direct version

* Initial commit of unified tensor

* Merge branch 'master' of https://github.com/davidmin7/dgl



* Remove unnecessary things

* Fix error message

* Fix/Add descriptions

* whitespace fix

* add unpin

* disable IndexSelectCPUFromGPU with no CUDA

* add a newline for unified_tensor.py

* Apply changes based on feedback

* add 'os' module

* skip unified tensor unit test for cpu only

* Update tests/pytorch/test_unified_tensor.py
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

* reflect feedback
Co-authored-by: shhssdm <shhssdm@gmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

905c0aa5

15 Jul, 2021 1 commit

[Bug fix] Various fix from bug bash (#3133) · 3f6f6941

Mufei Li authored Jul 15, 2021



* Update

* Update

* Update dependencies

* Update

* Update

* Fix ogbn-products gat

* Update

* Update

* Reformat

* Fix typo in node2vec_random_walk

* Specify file encoding

* Working for 6.7

* Update

* Fix subgraph

* Fix doc for sample_neighbors_biased

* Fix hyperlink

* Add example for udf cross reducer

* Fix

* Add example for slice_batch

* Replace dgl.bipartite

* Fix GATConv

* Fix math rendering

* Fix doc
Co-authored-by: Ubuntu <ubuntu@ip-172-31-28-17.us-west-2.compute.internal>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-22-156.us-west-2.compute.internal>

3f6f6941

13 Jul, 2021 1 commit

[Distributed] Deprecate old DistEmbedding impl, use synchronized embedding impl (#3111) · d7390763

xiang song(charlie.song) authored Jul 14, 2021



* fix.

* fix.

* fix.

* fix.

* Fix test

* Deprecate old DistEmbedding impl, use synchronized embedding impl

* update doc
Co-authored-by: Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>
Co-authored-by: Da Zheng <zhengda1936@gmail.com>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

d7390763

28 Jun, 2021 1 commit
- deal with situation where num_layers equals 1 (#3066) · 2f43cdb3
  WangYQ authored Jun 28, 2021
  
  2f43cdb3
25 Jun, 2021 1 commit
- [Doc] Update NodeDataLoader and EdgeDataLoader for GPU-based neighbor sampling (#3046) · 427a5a96
  Quan (Andy) Gan authored Jun 25, 2021
```
* update docstrings and tidy code

* add docs

* address comments

* Update __init__.py

* address comments
```
  427a5a96
16 Jun, 2021 1 commit

[Distributed] Support hierarchical partitioning (#3000) · aaec3d8a

Da Zheng authored Jun 16, 2021



* add.

* fix.

* fix.

* fix.

* fix.

* add tests.

* support node split and edge split.

* support 1 partition.

* add tests.

* fix.

* fix test.

* use hierarchical partition.

* add check.
Co-authored-by: Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-22-57.us-west-2.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>

aaec3d8a

13 Jun, 2021 1 commit

[Performance] Perform to_block on the GPU when the dataloader is created with... · 8b64ae59

nv-dlasalle authored Jun 13, 2021


[Performance] Perform to_block on the GPU when the dataloader is created with a GPU `device`. (#3016)

* add output device for dataloading

* Update dataloader

* Get sampler device from dataloader

* Fix line length

* Update examples

* Fix to_block GPU for empty relation types

* Handle the case where the DistGraph has None for the underlying graph
Co-authored-by: Da Zheng <zhengda1936@gmail.com>

8b64ae59

10 Jun, 2021 1 commit

[Bugfix][Examples] Fix graphsage multigpu training example training set size (#3002) · 9497a9be

nv-dlasalle authored Jun 10, 2021



* Make multigpu graphsage use whole datset

* Specify queeze dimension

* Remove squeeze dimension
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

9497a9be

26 May, 2021 1 commit

[Distributed] Specify the graph format for distributed training (#2948) · 18dbaebe

Da Zheng authored May 26, 2021



* explicitly set the graph format.

* fix.

* fix.

* fix launch script.

* fix readme.
Co-authored-by: Zheng <dzzhen@3c22fba32af5.ant.amazon.com>
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-71-112.ec2.internal>

18dbaebe

17 May, 2021 1 commit
- add use_ddp to dataloaders (#2911) · 7c7113f6
  Quan (Andy) Gan authored May 17, 2021
  
  7c7113f6
14 May, 2021 1 commit
- [Feature] Replacing thread_wrapped_func with minimal mp.Process wrapper (#2905) · caa6d607
  Quan (Andy) Gan authored May 14, 2021
```
* standardizing thread_wrapped_func

* lints

* Update __init__.py
```
  caa6d607
11 May, 2021 1 commit
- [Model] Training GraphSAGE with PyTorch Lightning (#2878) · 70695ff8
  Quan (Andy) Gan authored May 11, 2021
```
* pytorch lightning initial examples

* revert most changes in dataloader to favor #2886.

* address comments
```
  70695ff8
03 May, 2021 1 commit

[Distributed] Distributed node embedding and sparse optimizer (#2733) · 975eb8fc

xiang song(charlie.song) authored May 03, 2021



* Draft for sparse emb

* add some notes

* Fix

* Add sparse optim for dist pytorch

* Update test

* Fix

* upd

* upd

* Fix

* Fix

* Fix bug

* add transductive exmpale

* Fix example

* Some fix

* Upd

* Fix lint

* lint

* lint

* lint

* upd

* Fix lint

* lint

* upd

* remove dead import

* update

* lint

* update unitest

* update example

* Add adam optimizer

* Add unitest and update data

* upd

* upd

* upd

* Fix docstring and fix some bug in example code

* Update rgcn readme
Co-authored-by: Ubuntu <ubuntu@ip-172-31-57-25.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-24-210.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-2-66.ec2.internal>

975eb8fc

15 Apr, 2021 1 commit

[Performance][GPU] Enable GPU uniform edge sampling (#2716) · e70138bb

nv-dlasalle authored Apr 14, 2021



* Start on uniform GPU sampling

* Save more work

* Get cu file compiling

* Update sampling

* More changes

* Get GPU sampling for uniform probabilities solved

* Fix batch tensor migration

* Fix

* update kernels

* expand blocking

* Undo testing change

* Cut down on sampling overhead

* Fix replacement

* Update unit tests

* Add option to gpu sample in graphsage

* Copy only csc to gpu

* Add ogbn support

* Fix linting

* Remove nvtx from sample

* Improve documentation and error checking

* Expand documentation

* Update assert checking

* delete extra space

* Use standard dataloader when dataset is a dictionary

* ogb -> ogbn

* Fix edge selection determinism

* Fix typos

* Remove nvtx

* Add comment for self.fanout_arrays and assert

* Fix linting

* Migrate to scalarbatcher

* Fix indentation

* Fix batcher

* Fix indexing

* Only use databatcher for GPU

* Convert to DGL NDArray to PyTorch Tensor

* Add optimization for PyTorch's F.tensor() for list of GPU tensors
Co-authored-by: Da Zheng <zhengda1936@gmail.com>

e70138bb

08 Apr, 2021 1 commit

[Distributed] Fix a bug in multiprocessing sampling. (#2826) · bfbbefa7

Da Zheng authored Apr 08, 2021


Co-authored-by: Ubuntu <ubuntu@ip-172-31-73-81.ec2.internal>
Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

bfbbefa7

30 Mar, 2021 1 commit

[Distributed] Simplify distributed API (#2775) · e36c5db6

Da Zheng authored Mar 29, 2021



* remove num_workers.

* remove num_workers.

* remove num_workers.

* remove num-servers.

* update error message.

* update docstring.

* fix docs.

* fix tests.

* fix test.

* fix.

* print messages in test.

* fix.

* fix test.

* fix.
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-132.us-west-1.compute.internal>

e36c5db6

22 Mar, 2021 1 commit

[Bugfix] Update deprecated method name in load_graph.py (#2769) · e6f6c2eb

Kaiqiang Xu authored Mar 22, 2021

Method `dataset.num_labels` has been deprecated and replaced by `dataset.num_classes`.
Updating the method name to avoid runtime warning.
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

e6f6c2eb