Commits · 9d425315da2393895e11adcdbf0b868d70333c3b · OpenDAS / dgl

27 Jun, 2022 1 commit

[Dist] enable USE_EPOLL in default (#4167) · 9d425315

Rhett Ying authored Jun 27, 2022

* [Dist] enable USE_EPOLL in default

* fix build issue on windows

* fix build issue on windows

* fix build issue on windows

* fix build issue on windows

* fix build issue on windows

* fix build issue

9d425315

24 Jun, 2022 2 commits

[Doc] fix a bug in guide_cn (#4149) · d1f6f3a8
PotatoChipsNinja authored Jun 24, 2022
```
Co-authored-by: Xin Yao <xiny@nvidia.com>
```
d1f6f3a8

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdam Optimizer (#3885) · 020f0249

nv-dlasalle authored Jun 23, 2022



* Add uva by default to embedding

* More updates

* Update optimizer

* Add new uva functions

* Expose new pinned memory function

* Add unit tests

* Update formatting

* Fix unit test

* Handle auto UVA case when training is on CPU

* Allow per-embedding decisions for whether to use UVA

* Address spares_optim.py comments

* Remove unused templates

* Update unit test

* Use dgl allocate memory for pinning

* allow automatically unpin

* workaround for d2h copy with a different dtype

* fix linting

* update error message

* update copyright
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

020f0249

23 Jun, 2022 5 commits

[BugFix] Fix Correct&Smooth (#4102) (#4158) · 548c85ff

Lucas Prieto authored Jun 23, 2022


Co-authored-by: Mufei Li <mufeili1996@gmail.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>

548c85ff

[Example][Bugfix] Remove all torchtext legacy-related APIs for pytorch/pinsage example (#4130) · 598d746e

Chang Liu authored Jun 23, 2022



* Remove all torchtext legacy-related APIs

* Remove unused BagOfWordsPretrained class, and fix some typos
Co-authored-by: Mufei Li <mufeili1996@gmail.com>

598d746e

[Bugfix][Rework] Automatically unpin tensors pinned by DGL (rework #3997) (#4135) · 077e002f

Xin Yao authored Jun 23, 2022



* Explicitly unpin tensoradapter allocated arrays

* Undo unrelated change

* Add unit test

* update unit test

* add pinned_by_dgl flag to NDArray::Container

* use dgl.ndarray for holding the pinning status

* update multi-gpu uva inference

* reinterpret cast NDArray::Container* to DLTensor* in MoveAsDLTensor

* update unpin column and examples

* add unit test for unpin column
Co-authored-by: Dominique LaSalle <dlasalle@nvidia.com>
Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>

077e002f

[Fix] Fix compiler warnings - part 1 (#4051) · 1ad65879

Triston authored Jun 22, 2022



* Fix a cub compile error for CUDA 11.5

* Fix comparison of integer expressions of different signedness in coo_sort.cu file

* Fix comparison of integer expressions of different signedness in cuda_compact_graph.cu file

* Remove never referenced variable in spmm.cu

* Fix comparison of integer expressions of different signedness in rowwise_pick.h file

* Fix comparison of integer expressions of different signedness in choice.cc file

* Remove never referenced variable col_data in spat_op_impl_coo.cc

* Remove never referenced variable allowed in global_uniform.cc

* Fix comparison of integer expressions of different signedness in graph.cc

* Fix comparison of integer expressions of different signedness in graph_apis.cc

* Fix the un-used ctx variable in ndarray_partition.cc file for cpu only build

* Fix comparison of integer expressions of different signedness in libra_partition.cc

* Fix comparison of integer expressions of different signedness in graph_op.cc
Co-authored-by: Triston Cao <tristonc@nvidia.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

1ad65879

[Dist] etype is not guaranteed to be sorted (#4156) · ab1b2811
Rhett Ying authored Jun 23, 2022

ab1b2811

22 Jun, 2022 3 commits
- [Bug Fix] Fix the case when reverse_edge is False for citation graphs (#3840) · 4d3c01d6
  Mufei Li authored Jun 22, 2022
```
* Update citation_graph.py

* Update

* Update

* Update
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
```
  4d3c01d6
- [Bug] Fix problem with ShaDowKHopSampler working with reverse edge type exclusion (#4145) · 71157b05
  Quan (Andy) Gan authored Jun 22, 2022
```
* fix

* fix

* Update utils.py
```
  71157b05
- [BugFix] fix unstable sort when using dataloader with HeteroGraph (#4147) · 794ec4a4
  maqy authored Jun 22, 2022
```
* fix unstable sort

* add torch version check

* reformat

* split too long comments

* Update dataloader.py
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
```
  794ec4a4
21 Jun, 2022 1 commit

[DGL-Go] Inference for Node Prediction Pipeline (full & ns) (#4095) · 31e4a89b

Mufei Li authored Jun 21, 2022

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

31e4a89b

20 Jun, 2022 3 commits

[Dist] defer to load node/edge feats (#4143) · 69226588

Rhett Ying authored Jun 20, 2022



* [Dist] defer to load node/edge feats

* fix lint

* Update python/dgl/distributed/partition.py
Co-authored-by: Minjie Wang <minjie.wang@nyu.edu>

* Update python/dgl/distributed/partition.py
Co-authored-by: Minjie Wang <minjie.wang@nyu.edu>

* fix lint
Co-authored-by: Minjie Wang <minjie.wang@nyu.edu>

69226588

[Doc] Add ArangoDB-DGL in DGL-powered projects (#4139) · 532d4ac3
Anthony Mahanna authored Jun 20, 2022
```
* Update README.md

* fix: ArangoDB hyperlink
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
```
532d4ac3
[Dist] re-try to receive rpc ndarray msg (#4142) · 3ffe0c09
Rhett Ying authored Jun 20, 2022

3ffe0c09

17 Jun, 2022 1 commit

[Doc] Add distributed link prediction tutorial (#3993) · 4a9be030

RuisiZhang authored Jun 17, 2022



* add dist tutorial

* add predictor in dist prediction

* refine after rendering

* change links

* Update 2_link_prediction.py
Co-authored-by: Mufei Li <mufeili1996@gmail.com>
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

4a9be030

16 Jun, 2022 1 commit
- [Dist] set socket as default backend for RPC (#4120) · b258729b
  Rhett Ying authored Jun 16, 2022
```
* [Dist] set socket as default backend for RPC

* add tests both for socket and tensorpipe
```
  b258729b
15 Jun, 2022 3 commits

add gtrick in DGL-powered projects (#4128) · 702d08db
Yunxin Sang authored Jun 15, 2022

702d08db

[Doc] Updated transform ops list in dgl.rst (Issue #4087) (#4123) · 5f04fc2b

ndickson-nvidia authored Jun 15, 2022



* * Added functions from dgl.transforms.functional that were missing from the list for documentation in dgl.rst

* * Sorted transform ops list in dgl.rst in alphabetical order
Co-authored-by: Xin Yao <xiny@nvidia.com>

5f04fc2b

[Dist] Add env var for non-default SSH configs in tests (#4098) · 652f4c07

Serge Panev authored Jun 14, 2022


Signed-off-by: Serge Panev <spanev@nvidia.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

652f4c07

14 Jun, 2022 5 commits

[Bugfix] Disable non-atomic atomic operations (#4117) · 473bf15f
nv-dlasalle authored Jun 14, 2022
```
* Disable non-atomic atomic operations

* Improve error message

* Make error message more friendly
```
473bf15f

[Bugfix] Fix fail to create_shared_mem_array in ddp spawn train #4110 (#4111) · 9a6f2924

彭齐荣 authored Jun 14, 2022



* Fix fail to create_shared_mem_array in ddp spawn train #4110

Fix fail to create_shared_mem_array in ddp spawn train #4110

* [Bugfix] Fix fail to create_shared_mem_array in ddp spawn train #4110

[Bugfix] Fix fail to create_shared_mem_array in ddp spawn train #4110
Replace random.seed() to random_ = random.Random()

* Update pytorch.py
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

9a6f2924

[CI] loose time liimit for unit tests on Win64 (#4119) · 7936e2ed
Rhett Ying authored Jun 14, 2022

7936e2ed

[Dataset] Add Flickr and Yelp dataset (#4099) · defa292b

RecLusIve-F authored Jun 14, 2022



* Add Flickr and Yelp dataset

* Update flickr.py

* update

* Update yelp.py

* Update yelp.py

* update

* Update yelp.py

* Update test_data.py

* Update yelp.py

* update

* Update test_data.py

* Update yelp.py
Co-authored-by: Mufei Li <mufeili1996@gmail.com>

defa292b

[Dist] master port should be fixed for all trainers (#4108) · 9501ed6a
Rhett Ying authored Jun 14, 2022
```
* [Dist] master port should be fixed for all trainers

* add tests for tools/launch.py
```
9501ed6a

12 Jun, 2022 2 commits

[dataset] Add a `reorder` flag to builtin datasets (#4104) · 92e77330

Huarui HE authored Jun 12, 2022



* add argument reorder=False for citation_graph

* add description of the argument reorder

* add reordered/un_reordered save_path

* add version number postfix
Co-authored-by: Mufei Li <mufeili1996@gmail.com>

92e77330

Update README.md (#4105) · 148575e4
Quan (Andy) Gan authored Jun 12, 2022

148575e4

11 Jun, 2022 1 commit

[Fix] Wrap all CUDA runtime API/CUB calls with macro (#4083) · 60b1c992

Xin Yao authored Jun 11, 2022



* Wrap all CUDA runtime API/CUB calls with macro

* remove the usage of explicit cudaMalloc in favor of AllocWorkspace

* fix typo
Co-authored-by: Israt Nisa <neesha295@gmail.com>

60b1c992

09 Jun, 2022 3 commits
- [Dist] avoid busy ssh connection (#4096) · 966d1aa8
  Rhett Ying authored Jun 09, 2022
  
  966d1aa8
- disable multiple groups tests due to random failure in CI (#4101) · abcc9cce
  Rhett Ying authored Jun 09, 2022
  
  abcc9cce
- [Bugfix] Fix example case: examples/pytorch/ogb/ogbn-proteins and... · 549df65a
  Chang Liu authored Jun 09, 2022
```
[Bugfix] Fix example case: examples/pytorch/ogb/ogbn-proteins and examples/pytorch/ogb/ogbn-products (#4080)

* [Bugfix] Fix ogbn-gat-proteins/products examples

* Remove unused BatchSampler definition

* Remove comments to ease reading/reviewing

* Remove dataloader wrapper
```
  549df65a
08 Jun, 2022 4 commits
- [Dist] enable time out when fetching msg (#4043) · cac3720b
  Rhett Ying authored Jun 08, 2022
```
* [ist] enable time out when fetching msg

* fix lint error

* minor refinements

* improve minor log

* fix dist test

* fix timeout issue in tensorpipe
```
  cac3720b
- [DistTest] add python test of RPC (#4093) · 2de80dde
  Rhett Ying authored Jun 08, 2022
```
* [DistTest] add python test of RPC

* remove return
```
  2de80dde
- [DistTest] add basic pipeline for dist test across machines (#3984) · c1ff4c9b
  Rhett Ying authored Jun 08, 2022
```
* [DistTest] add basic pipeline for dist test across machines

* move launch remote cmd to separate file

* add test for rpc

* fix function naming rule
Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>
```
  c1ff4c9b
- [Bug] Fixed missing dgl.transforms.functional __all__ entries (#4089) · d92a4e8b
  ndickson-nvidia authored Jun 07, 2022
  
  d92a4e8b
07 Jun, 2022 2 commits

[Bug][Feature] Added cublasGemm<__half> specialization (#3988) (#4029) · eabcc58e

ndickson-nvidia authored Jun 07, 2022

* * Added specialization of cublasGemm function for `__half` type, to try to address https://github.com/dmlc/dgl/issues/3988



* * Added USE_FP16 guard

* * Added test cases to test_segment_mm, to test newly-added FP16 specialization of cublasGemm

* * Replaced for loop in test_segment_mm with pytest.mark.parametrize, as recommended
Co-authored-by: Xin Yao <xiny@nvidia.com>

eabcc58e

[Doc] Update link to correct destination. (#3966) · 85c2ff71

Tudor Andrei Dumitrascu authored Jun 07, 2022



* Update link to correct destination.

* Update 4_rgcn.py

* Update 4_rgcn.py

* Update tutorials/models/1_gnn/4_rgcn.py

* Update tutorials/models/1_gnn/4_rgcn.py

* Update tutorials/models/1_gnn/4_rgcn.py
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

85c2ff71

06 Jun, 2022 3 commits

[Bug] Added common operations for FP16 on older GPUs (#4079) · ea44da50

ndickson-nvidia authored Jun 06, 2022

* * Added support for common operations on FP16 (`half` or `__half`) for older GPU architectures
* Fixed an issue with previous check for FP16 support

* * Removing FP16 type checks, since they should no longer be needed

* * Fixed AtomicAdd to be atomic for `float` and `double` for old GPU architectures.  Unfortunately, it seems that atomicCAS for unsigned short seems to be unavailable until architecture 70, so half will have to stay non-atomic on old GPUs.

* * Fixed non-atomic version of `AtomicAdd<half>` for older GPUs to return old value instead value of new

ea44da50

parallelize csr2coo (#4081) · 31a81438
Quan (Andy) Gan authored Jun 06, 2022
```
Co-authored-by: Xin Yao <xiny@nvidia.com>
```
31a81438

wrap all cuda kernel calls with macro (#4066) · 6014623d

Xin Yao authored Jun 06, 2022


Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: Israt Nisa <neesha295@gmail.com>

6014623d