Commit 6140395f authored by Kai Zhang's avatar Kai Zhang Committed by Facebook GitHub Bot
Browse files

Use current device id in dist.barrier

Summary:
Pull Request resolved: https://github.com/facebookresearch/detectron2/pull/3350

`get_local_rank` relies on a global variable set by Detectron2's `launch` utils.
Since other frameworks might use Detectron2's distribute utils but don't launch with Detectron2's `launch` utils. Use `torch.cuda.current_device` to get the current device instead.

Reviewed By: HarounH, ppwwyyxx

Differential Revision: D30233746

fbshipit-source-id: 0b140ed5c1e7cd87ccf05235127f338ffc40a53d
parent adf223bd
......@@ -148,12 +148,12 @@ def _distributed_worker(
if i == machine_rank:
comm._LOCAL_PROCESS_GROUP = pg
if backend in ["NCCL"]:
torch.cuda.set_device(local_rank)
# synchronize is needed here to prevent a possible timeout after calling
# init_process_group
# See: https://github.com/facebookresearch/maskrcnn-benchmark/issues/172
comm.synchronize()
if backend in ["NCCL"]:
torch.cuda.set_device(local_rank)
ret = main_func(*args)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment