Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
7ce373f3
Commit
7ce373f3
authored
Oct 29, 2020
by
Deepak Narayanan
Browse files
Bugfix in megatron/training.py: correct global_batch_size computation
Prevents data_loader from running out of training examples
parent
9d4c735a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
megatron/training.py
megatron/training.py
+1
-1
No files found.
megatron/training.py
View file @
7ce373f3
...
@@ -716,7 +716,7 @@ def build_train_valid_test_data_iterators(
...
@@ -716,7 +716,7 @@ def build_train_valid_test_data_iterators(
if
mpu
.
get_tensor_model_parallel_rank
()
==
0
:
if
mpu
.
get_tensor_model_parallel_rank
()
==
0
:
# Rank, size, and global batch size.
# Rank, size, and global batch size.
data_parallel_size
=
mpu
.
get_data_parallel_world_size
()
data_parallel_size
=
mpu
.
get_data_parallel_world_size
()
global_batch_size
=
args
.
batch_size
*
data_parallel_size
global_batch_size
=
args
.
batch_size
*
data_parallel_size
*
args
.
num_microbatches_in_minibatch
# Number of train/valid/test samples.
# Number of train/valid/test samples.
train_iters
=
args
.
train_iters
train_iters
=
args
.
train_iters
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment