"examples/pytorch/git@developer.sourcefind.cn:OpenDAS/dgl.git" did not exist on "3c387988d7addc4a6b92785c12b64566d164bb55"
Commit 6140395f authored by Kai Zhang's avatar Kai Zhang Committed by Facebook GitHub Bot
Browse files

Use current device id in dist.barrier

Summary:
Pull Request resolved: https://github.com/facebookresearch/detectron2/pull/3350

`get_local_rank` relies on a global variable set by Detectron2's `launch` utils.
Since other frameworks might use Detectron2's distribute utils but don't launch with Detectron2's `launch` utils. Use `torch.cuda.current_device` to get the current device instead.

Reviewed By: HarounH, ppwwyyxx

Differential Revision: D30233746

fbshipit-source-id: 0b140ed5c1e7cd87ccf05235127f338ffc40a53d
parent adf223bd
...@@ -148,12 +148,12 @@ def _distributed_worker( ...@@ -148,12 +148,12 @@ def _distributed_worker(
if i == machine_rank: if i == machine_rank:
comm._LOCAL_PROCESS_GROUP = pg comm._LOCAL_PROCESS_GROUP = pg
if backend in ["NCCL"]:
torch.cuda.set_device(local_rank)
# synchronize is needed here to prevent a possible timeout after calling # synchronize is needed here to prevent a possible timeout after calling
# init_process_group # init_process_group
# See: https://github.com/facebookresearch/maskrcnn-benchmark/issues/172 # See: https://github.com/facebookresearch/maskrcnn-benchmark/issues/172
comm.synchronize() comm.synchronize()
if backend in ["NCCL"]:
torch.cuda.set_device(local_rank)
ret = main_func(*args) ret = main_func(*args)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment