[fix] Change the condition of ValueError in...

[fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" (#24769) * fix: half inference error norm_factor is still torch.float32 after using model.half So I changed it to register_buffer so I can change it to torch.float16 after using model.half * fix: Added a variable "persistent=False" * run make style * [fix] Change the condition of ValueError convert_checkpoint_from_transformers_to_megatron * [fix] error wording layers -> attention heads

[fix] Change the condition of ValueError in...
[fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" (#24769) * fix: half inference error norm_factor is still torch.float32 after using model.half So I changed it to register_buffer so I can change it to torch.float16 after using model.half * fix: Added a variable "persistent=False" * run make style * [fix] Change the condition of ValueError convert_checkpoint_from_transformers_to_megatron * [fix] error wording layers -> attention heads
21946a8c · SeongBeomLEE · GitHub · 1f6f32c2 · 21946a8c
Unverified Commit 21946a8c authored Jul 13, 2023 by SeongBeomLEE Committed by GitHub Jul 13, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 2 deletions

src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py ...egatron_gpt2/checkpoint_reshaping_and_interoperability.py +9 -2

No files found.
--- a/src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py
+++ b/src/transformers/models/megatron_gpt2/checkpoint_reshaping_and_interoperability.py
@@ -741,11 +741,18 @@ def convert_checkpoint_from_transformers_to_megatron(args):

    # Transformer layers
    print("converting transformer layers")
-    if config.num_hidden_layers % args.target_tensor_model_parallel_size != 0:
+    if config.num_attention_heads % args.target_tensor_model_parallel_size != 0:
        raise ValueError(
-            f"Number of layers ({config.num_hidden_layers}) must be divisible by number of tensor parallelism"
+            f"Number of attention heads ({config.num_attention_heads}) must be divisible by number of tensor parallelism"
            f" ({args.target_tensor_model_parallel_size})"
        )
+
+    if config.num_hidden_layers % args.target_pipeline_model_parallel_size != 0:
+        raise ValueError(
+            f"Number of layers ({config.num_hidden_layers}) must be divisible by number of pipeline parallelism"
+            f" ({args.target_pipeline_model_parallel_size})"
+        )
+
    num_layers = config.num_hidden_layers // args.target_pipeline_model_parallel_size

    layer_re = re.compile(r"transformer.h\.(\d+)\.([a-z0-9_.]+)\.([a-z]+)")