Bugfix in megatron/training.py: correct global_batch_size computation

Prevents data_loader from running out of training examples

Bugfix in megatron/training.py: correct global_batch_size computation
Prevents data_loader from running out of training examples
7ce373f3 · Deepak Narayanan · 9d4c735a · 7ce373f3
Commit 7ce373f3 authored Oct 29, 2020 by Deepak Narayanan
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

megatron/training.py megatron/training.py +1 -1

No files found.
--- a/megatron/training.py
+++ b/megatron/training.py
@@ -716,7 +716,7 @@ def build_train_valid_test_data_iterators(
    if mpu.get_tensor_model_parallel_rank() == 0:
        # Rank, size, and global batch size.
        data_parallel_size = mpu.get_data_parallel_world_size()
-        global_batch_size = args.batch_size * data_parallel_size
+        global_batch_size = args.batch_size * data_parallel_size * args.num_microbatches_in_minibatch
        # Number of train/valid/test samples.
        train_iters = args.train_iters