Unverified Commit dfb32264 authored by Wenxuan Tan's avatar Wenxuan Tan Committed by GitHub
Browse files

Use device_id in dist init to reduce NCCL communicator warmup & creation overhead (#5728)

parent 63c13a2c
...@@ -1055,6 +1055,11 @@ def init_distributed_environment( ...@@ -1055,6 +1055,11 @@ def init_distributed_environment(
world_size=world_size, world_size=world_size,
rank=rank, rank=rank,
timeout=timeout, timeout=timeout,
device_id=torch.device(
f"cuda:{torch.cuda.current_device()}"
if hasattr(torch, "cuda") and torch.cuda.is_available()
else None
), # Allow NCCL to eagerly init communicator
) )
# set the local rank # set the local rank
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment