Unverified Commit 5e1a373d authored by Aaron Hao's avatar Aaron Hao Committed by GitHub
Browse files

[BUG] Fix rank calculation in NCCLWeightTransferEngine (#36940)


Signed-off-by: default avatarhao-aaron <ahao@anyscale.com>
parent 572c776b
......@@ -132,7 +132,7 @@ class NCCLWeightTransferEngine(
# Calculate the global rank in the trainer-worker process group
# Must account for data parallel to get unique ranks across all workers
dp_rank = self.parallel_config.data_parallel_rank
dp_rank = self.parallel_config.data_parallel_index
world_size_per_dp = self.parallel_config.world_size # TP * PP
rank_within_dp = self.parallel_config.rank
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment