Use current device id in dist.barrier

Summary: Pull Request resolved: https://github.com/facebookresearch/detectron2/pull/3350 `get_local_rank` relies on a global variable set by Detectron2's `launch` utils. Since other frameworks might use Detectron2's distribute utils but don't launch with Detectron2's `launch` utils. Use `torch.cuda.current_device` to get the current device instead. Reviewed By: HarounH, ppwwyyxx Differential Revision: D30233746 fbshipit-source-id: 0b140ed5c1e7cd87ccf05235127f338ffc40a53d

Use current device id in dist.barrier
Summary: Pull Request resolved: https://github.com/facebookresearch/detectron2/pull/3350 `get_local_rank` relies on a global variable set by Detectron2's `launch` utils. Since other frameworks might use Detectron2's distribute utils but don't launch with Detectron2's `launch` utils. Use `torch.cuda.current_device` to get the current device instead. Reviewed By: HarounH, ppwwyyxx Differential Revision: D30233746 fbshipit-source-id: 0b140ed5c1e7cd87ccf05235127f338ffc40a53d
6140395f · Kai Zhang · Facebook GitHub Bot · adf223bd · 6140395f
Commit 6140395f authored Aug 11, 2021 by Kai Zhang Committed by Facebook GitHub Bot Aug 11, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

d2go/distributed.py d2go/distributed.py +2 -2

No files found.
--- a/d2go/distributed.py
+++ b/d2go/distributed.py
@@ -148,12 +148,12 @@ def _distributed_worker(
        if i == machine_rank:
            comm._LOCAL_PROCESS_GROUP = pg
+    if backend in ["NCCL"]:
+        torch.cuda.set_device(local_rank)
    # synchronize is needed here to prevent a possible timeout after calling
    # init_process_group
    # See: https://github.com/facebookresearch/maskrcnn-benchmark/issues/172
    comm.synchronize()
-    if backend in ["NCCL"]:
-        torch.cuda.set_device(local_rank)
    ret = main_func(*args)