Unverified Commit a6a9141b authored by Alp Dener's avatar Alp Dener Committed by GitHub
Browse files

[PyTorch] Missing intra-domain ranks list when initializing Userbuffers with...


[PyTorch] Missing intra-domain ranks list when initializing Userbuffers with data parallelism (#1305)

added missing list of intra-domain ranks when num_domains > 1 in initialize_ub
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>
parent 4b8ffef4
...@@ -234,6 +234,7 @@ def initialize_ub( ...@@ -234,6 +234,7 @@ def initialize_ub(
ranks_per_domain_list, backend=bootstrap_backend ranks_per_domain_list, backend=bootstrap_backend
) )
local_rank = torch.distributed.get_rank(intra_domain_group) local_rank = torch.distributed.get_rank(intra_domain_group)
intra_domain_ranks = torch.distributed.get_process_group_ranks(intra_domain_group)
inter_domain_group, _ = torch.distributed.new_subgroups_by_enumeration( inter_domain_group, _ = torch.distributed.new_subgroups_by_enumeration(
[list(ranks) for ranks in zip(*ranks_per_domain_list)], [list(ranks) for ranks in zip(*ranks_per_domain_list)],
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment