Commit 96f4c5d2 authored by MaximumEntropy's avatar MaximumEntropy
Browse files

Undo parallel state changes


Signed-off-by: default avatarMaximumEntropy <sandeep.subramanian.1@umontreal.ca>
parent 18b26ec6
...@@ -53,7 +53,6 @@ def initialize_model_parallel( ...@@ -53,7 +53,6 @@ def initialize_model_parallel(
pipeline_model_parallel_size: int = 1, pipeline_model_parallel_size: int = 1,
virtual_pipeline_model_parallel_size: Optional[int] = None, virtual_pipeline_model_parallel_size: Optional[int] = None,
pipeline_model_parallel_split_rank: Optional[int] = None, pipeline_model_parallel_split_rank: Optional[int] = None,
untie_embeddings_and_output_weights: bool = False,
) -> None: ) -> None:
""" """
Initialize model data parallel groups. Initialize model data parallel groups.
...@@ -94,9 +93,6 @@ def initialize_model_parallel( ...@@ -94,9 +93,6 @@ def initialize_model_parallel(
pipeline_model_parallel_split_rank is 3, then ranks 0-2 pipeline_model_parallel_split_rank is 3, then ranks 0-2
will be the encoder and ranks 3-7 will be the decoder. will be the encoder and ranks 3-7 will be the decoder.
untie_embeddings_and_output_weights: whether to use separate embedding and output layer.
this affects the computation of embedding groups
Let's say we have a total of 16 GPUs denoted by g0 ... g15 and we Let's say we have a total of 16 GPUs denoted by g0 ... g15 and we
use 2 GPUs to parallelize the model tensor, and 4 GPUs to parallelize use 2 GPUs to parallelize the model tensor, and 4 GPUs to parallelize
the model pipeline. The present function will the model pipeline. The present function will
...@@ -204,19 +200,13 @@ def initialize_model_parallel( ...@@ -204,19 +200,13 @@ def initialize_model_parallel(
# Setup embedding group (to exchange gradients between # Setup embedding group (to exchange gradients between
# first and last stages). # first and last stages).
if len(ranks) > 1: if len(ranks) > 1:
if untie_embeddings_and_output_weights: embedding_ranks = [ranks[0], ranks[-1]]
embedding_ranks = [ranks[0]]
else:
embedding_ranks = [ranks[0], ranks[-1]]
position_embedding_ranks = [ranks[0]] position_embedding_ranks = [ranks[0]]
if pipeline_model_parallel_split_rank is not None: if pipeline_model_parallel_split_rank is not None:
if ranks[pipeline_model_parallel_split_rank] not in embedding_ranks: if ranks[pipeline_model_parallel_split_rank] not in embedding_ranks:
if untie_embeddings_and_output_weights: embedding_ranks = [ranks[0],
embedding_ranks = [ranks[0], ranks[pipeline_model_parallel_split_rank]] ranks[pipeline_model_parallel_split_rank],
else: ranks[-1]]
embedding_ranks = [ranks[0],
ranks[pipeline_model_parallel_split_rank],
ranks[-1]]
if ranks[pipeline_model_parallel_split_rank] not in position_embedding_ranks: if ranks[pipeline_model_parallel_split_rank] not in position_embedding_ranks:
position_embedding_ranks = [ranks[0], position_embedding_ranks = [ranks[0],
ranks[pipeline_model_parallel_split_rank]] ranks[pipeline_model_parallel_split_rank]]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment