- 24 Jun, 2022 1 commit
-
-
nv-dlasalle authored
* Add uva by default to embedding * More updates * Update optimizer * Add new uva functions * Expose new pinned memory function * Add unit tests * Update formatting * Fix unit test * Handle auto UVA case when training is on CPU * Allow per-embedding decisions for whether to use UVA * Address spares_optim.py comments * Remove unused templates * Update unit test * Use dgl allocate memory for pinning * allow automatically unpin * workaround for d2h copy with a different dtype * fix linting * update error message * update copyright Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 23 Mar, 2022 1 commit
-
-
Jinjing Zhou authored
* try fix * try fix * try fix * try fix * Revert "try fix" This reverts commit a3fa0b1e9c0ab892cc3a22acf3770903db8b14a7. * try fix shared memory * try fix shared memory * try fix image version * fix
-
- 24 Jun, 2021 1 commit
-
-
xiang song(charlie.song) authored
[Bug fix] Use shared memory for grad sync when NCCL is not avaliable as PyTorch distributed backend. (#3034) * Use shared memory for grad sync when NCCL is not avaliable as PyTorch distributed backend. Fix small bugs and update unitests * Fix bug * update test * update test * Fix unitest * Fix unitest * Fix test * Fix * simple update Co-authored-by:Ubuntu <ubuntu@ip-172-31-24-212.ec2.internal>
-
- 11 Jun, 2021 1 commit
-
-
nv-dlasalle authored
* Split from NCCL PR * Fix type in comment * Expand documentation for sparse_all_to_all_push * Restore previous behavior in example * Re-work optimizer to use NCCL based on gradient location * Allow for running with embedding on CPU but using NCCL for gradient exchange * Optimize single partition case * Fix pylint errors * Add missing include * fix gradient indexing * Fix line continuation * Migrate 'first_step' * Skip tests without enough GPUs to run NCCL * Improve empty tensor handling for pytorch 1.5 * Fix indentation * Allow multiple NCCL communicator to coexist * Improve handling of empty message * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Update python/dgl/nn/pytorch/sparse_emb.py Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com> * Keepy empty tensor dimensionaless * th.empty -> th.tensor * Preserve shape for empty non-zero dimension tensors * Use shared state, when embedding is shared * Add support for gathering an embedding * Fix typo * Fix more typos * Fix backend call * Use NodeDataLoader to take advantage of ddp * Update training script to share memory * Only squeeze last dimension * Better handle empty message * Keep embedding on the target device GPU if dgl_sparse if false in RGCN example * Fix typo in comment * Add asserts * Improve documentation in example Co-authored-by:
xiang song(charlie.song) <classicxsong@gmail.com>
-
- 25 Apr, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Fix #2856 * upd * Fix unitest * upd * upd * upd * Fix Co-authored-by:Ubuntu <ubuntu@ip-172-31-57-25.ec2.internal>
-
- 27 Jan, 2021 1 commit
-
-
xiang song(charlie.song) authored
* Add sparse embedding for dgl and update rgcn example * upd * Fix * Revert "Fix" This reverts commit 4da87cdfb8b8c3506b7fc7376cd2385ba8045c2a. * Fix * upd * upd * Fix * Add unitest and update impl * fix * Clean up rgcn example code * upd * upd * update * Fix * update score * sparse for sage * remove model sparse * upd * upd * remove global norm * revert delete model_sparse.py * update according to comments * Fix doc * upd * Fix test * upd * lint * lint * lint * upd * upd * clean up Co-authored-by:Ubuntu <ubuntu@ip-172-31-56-220.ec2.internal>
-