Commits · 436de3d120896de200aae6506e4e4ff5e716ceab · OpenDAS / dgl

09 May, 2022 1 commit

[Dist][Optim] Fixed race conditions in distributed SparseAdam and SparseAdagrad (#3971) · edf2d526

ndickson-nvidia authored May 09, 2022

* * Fixed race condition bug in distributed/optim/pytorch/sparse_optim.py's SparseAdam::update, corresponding with the bug fixed in the non-distributed version in https://github.com/dmlc/dgl/pull/3013 , though using the newer Event-based approach from that corresponding function.  The race condition would often result in NaNs, like the previously fixed bug. https://github.com/dmlc/dgl/issues/2760



* * Fixed race condition bug in SparseAdagrad::update corresponding with the one fixed in SparseAdam::update in the previous commit.  Same info applies.

* * Fixed typo in all copies of a repeatedly-copied comment near bug fixed 3 commits ago, checking all implementations nearby for a corresponding bug.  (All of them appear to have been fixed as of 2 commits ago.)

* * Removed trailing whitespace
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

edf2d526

17 Mar, 2020 1 commit

[DGL-KE] Add license to every file header (#1368) · 635dfb4a

Chao Ma authored Mar 17, 2020

* update metis

* update

* update dataloader

* update dataloader

* new script

* update

* update

* update

* update

* update

* update

* update

* update dataloader

* update

* update

* update

* update

* update

* update

* update

* Add license to every filer header

635dfb4a

09 Feb, 2020 1 commit

[Optimization][KG] Several optimizations on DGL-KG (#1233) · ffe58983

xiang song(charlie.song) authored Feb 10, 2020

* Several optimizations on DGL-KG:
1. Sorted positive edges for sampling which can reduce random
   memory access during positive sampling
2. Asynchronous node embedding update
3. Balanced Relation Partition that gives balanced number of
   edges in each partition. When there is no cross partition
   relation, relation embedding can be pin into GPU memory
4. tunable neg_sample_size instead of fixed neg_sample_size

* Fix test

* Fix test and eval.py

* Now TransR is OK

* Fix single GPU with mix_cpu_gpu

* Add app tests

* Fix test script

* fix mxnet

* Fix sample

* Add docstrings

* Fix

* Default value for num_workers

* Upd

* upd

ffe58983

08 Jan, 2020 1 commit

[Feature][KG] Multi-GPU training support for DGL KGE (#1178) · bb6a6476

xiang song(charlie.song) authored Jan 08, 2020

* multi-gpu

* Pytorch can run but test has acc problem

* pytorch train/eval can run in multi-gpu

* Fix eval

* Fix

* Fix mxnet

* trigger

* triger

* Fix mxnet score_func

* Fix

* check

* FIx default arg

* Fix train_mxnet mix_cpu_gpu

* Make relation mix_cpu_gpu

* delete some dead code

* some opt for update

* Fix cpu grad update

bb6a6476

01 Dec, 2019 1 commit
- [KG][Score Func] Update TransE with L2 distance support. (#1059) · dca0e376
  xiang song(charlie.song) authored Dec 02, 2019
```
* Add L2 distance score for TransE

* Update README.md

* Use linalg.gemm to speedup mx l2 dist

* Fix
```
  dca0e376
14 Nov, 2019 1 commit

[Model] add RotatE to dgl-kg (#964) · 8b17a5c1

MilkshakeForReal authored Nov 13, 2019

Add RotatE support for KGE (apps/kg)
Performance Result:
Dataset FB15k:
Result from Paper:
MR: 40
MRR: 0.797
HIT@1: 74.6
HIT@3: 83.0
HIT@10: 88.4

Our Impl:
MR: 39.6
MRR: 0.725
HIT@1: 62.8
HIT@3: 80.2
HIT@10: 87.5

8b17a5c1

01 Nov, 2019 1 commit

[NN]Supporting TransR in app/kg score_func (#945) · 7f65199a

xiang song(charlie.song) authored Nov 02, 2019

* Add TransR for kge

* Now Pytorch TransR can run

* Add MXNet TransR

* Now mxnet can work with small dim size

* Add test

* Pass simple test_score

* Update test with transR score func

* Update RESCAL MXNet

* Add missing funcs

* Update init func for transR score

* Revert "Update init func for transR score"

This reverts commit 0798bb886095e7581f6675da5343376844ce45b9.

* Update score func of TransR MXNet

Make it more memory friendly and faster,
thourgh it is still very slow and memory consuming

* Update best config

* Fix ramdom seed for test

* Init score-func specific var

* Update Readme

7f65199a

12 Oct, 2019 1 commit

[KG] Add RESCAL model to DGL-KGE (#923) · 02fb0581

Chao Ma authored Oct 12, 2019

* Add RESCAL model

* update

* update

* match acc

* update

* add README.md

* fix

02fb0581

11 Oct, 2019 1 commit

[KG] ComplEx score func for MXNet (#918) · bde75256

xiang song(charlie.song) authored Oct 11, 2019

* upd

* fig edgebatch edges

* add test

* trigger

* Update README.md for pytorch PinSage example.

Add noting that the PinSage model example under
example/pytorch/recommendation only work with Python 3.6+
as its dataset loader depends on stanfordnlp package
which work only with Python 3.6+.

* Provid a frame agnostic API to test nn modules on both CPU and CUDA side.

1. make dgl.nn.xxx frame agnostic
2. make test.backend include dgl.nn modules
3. modify test_edge_softmax of test/mxnet/test_nn.py and
    test/pytorch/test_nn.py work on both CPU and GPU

* Fix style

* Delete unused code

* Make agnostic test only related to tests/backend

1. clear all agnostic related code in dgl.nn
2. make test_graph_conv agnostic to cpu/gpu

* Fix code style

* fix

* doc

* Make all test code under tests.mxnet/pytorch.test_nn.py
work on both CPU and GPU.

* Fix syntex

* Remove rand

* Add TAGCN nn.module and example

* Now tagcn can run on CPU.

* Add unitest for TGConv

* Fix style

* For pubmed dataset, using --lr=0.005 can achieve better acc

* Fix style

* Fix some descriptions

* trigger

* Fix doc

* Add nn.TGConv and example

* Fix bug

* Update data in mxnet.tagcn test acc.

* Fix some comments and code

* delete useless code

* Fix namming

* Fix bug

* Fix bug

* Add test for mxnet TAGCov

* Add test code for mxnet TAGCov

* Update some docs

* Fix some code

* Update docs dgl.nn.mxnet

* Update weight init

* Fix

* reproduce the bug

* Fix concurrency bug reported at #755.
Also make test_shared_mem_store.py more deterministic.

* Update test_shared_mem_store.py

* Update dmlc/core

* Add complEx for mxnet

* ComplEx is ready for MXNet

bde75256

04 Oct, 2019 1 commit
- [KG] save embeddings in NumPy (#900) · df8a7be5
  Da Zheng authored Oct 04, 2019
```
* fix loading and saving.

* use numpy.
```
  df8a7be5
02 Oct, 2019 1 commit

[KG][Model] Knowledge graph embeddings (#888) · 15b951d4

Da Zheng authored Oct 02, 2019

* upd

* fig edgebatch edges

* add test

* trigger

* Update README.md for pytorch PinSage example.

Add noting that the PinSage model example under
example/pytorch/recommendation only work with Python 3.6+
as its dataset loader depends on stanfordnlp package
which work only with Python 3.6+.

* Provid a frame agnostic API to test nn modules on both CPU and CUDA side.

1. make dgl.nn.xxx frame agnostic
2. make test.backend include dgl.nn modules
3. modify test_edge_softmax of test/mxnet/test_nn.py and
    test/pytorch/test_nn.py work on both CPU and GPU

* Fix style

* Delete unused code

* Make agnostic test only related to tests/backend

1. clear all agnostic related code in dgl.nn
2. make test_graph_conv agnostic to cpu/gpu

* Fix code style

* fix

* doc

* Make all test code under tests.mxnet/pytorch.test_nn.py
work on both CPU and GPU.

* Fix syntex

* Remove rand

* Add TAGCN nn.module and example

* Now tagcn can run on CPU.

* Add unitest for TGConv

* Fix style

* For pubmed dataset, using --lr=0.005 can achieve better acc

* Fix style

* Fix some descriptions

* trigger

* Fix doc

* Add nn.TGConv and example

* Fix bug

* Update data in mxnet.tagcn test acc.

* Fix some comments and code

* delete useless code

* Fix namming

* Fix bug

* Fix bug

* Add test for mxnet TAGCov

* Add test code for mxnet TAGCov

* Update some docs

* Fix some code

* Update docs dgl.nn.mxnet

* Update weight init

* Fix

* init version.

* change default value of regularization.

* avoid specifying adversarial_temperature

* use default eval_interval.

* remove original model.

* remove optimizer.

* set default value of num_proc

* set default value of log_interval.

* don't need to set neg_sample_size_valid.

* remove unused code.

* use uni_weight by default.

* unify model.

* rename model.

* remove unnecessary data sampler.

* remove the code for checkpoint.

* fix eval.

* raise exception in invalid arguments.

* remove RowAdagrad.

* remove unsupported score function for now.

* Fix bugs of kg
Update README

* Update Readme for mxnet distmult

* Update README.md

* Update README.md

* revert changes on dmlc

* add tests.

* update CI.

* add tests script.

* reorder tests in CI.

* measure performance.

* add results on wn18

* remove some code.

* rename the training script.

* new results on TransE.

* remove --train.

* add format.

* fix.

* use EdgeSubgraph.

* create PBGNegEdgeSubgraph to simplify the code.

* fix test

* fix CI.

* run nose for unit tests.

* remove unused code in dataset.

* change argument to save embeddings.

* test training and eval scripts in CI.

* check Pytorch version.

* fix a minor problem in config.

* fix a minor bug.

* fix readme.

* Update README.md

* Update README.md

* Update README.md

15b951d4