Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
b037a69e
Commit
b037a69e
authored
Jan 19, 2022
by
Vijay Korthikanti
Browse files
using all_gather instead of gather (nccl does not support gather)
parent
a7ee77ea
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
6 deletions
+4
-6
megatron/checkpointing.py
megatron/checkpointing.py
+4
-6
No files found.
megatron/checkpointing.py
View file @
b037a69e
...
@@ -154,13 +154,11 @@ def get_rng_state():
...
@@ -154,13 +154,11 @@ def get_rng_state():
if
torch
.
distributed
.
is_initialized
()
and
\
if
torch
.
distributed
.
is_initialized
()
and
\
mpu
.
get_data_parallel_world_size
()
>
1
and
\
mpu
.
get_data_parallel_world_size
()
>
1
and
\
args
.
data_parallel_random_init
:
args
.
data_parallel_random_init
:
if
mpu
.
get_data_parallel_rank
()
==
0
:
rng_state_list
=
\
rng_state_list
=
\
[
None
for
i
in
range
(
mpu
.
get_data_parallel_world_size
())]
[
None
for
i
in
range
(
mpu
.
get_data_parallel_world_size
())]
torch
.
distributed
.
all_gather_object
(
torch
.
distributed
.
gather_object
(
rng_state
,
rng_state_list
,
rng_state_list
,
dst
=
mpu
.
get_data_parallel_src_rank
()
,
rng_state
,
group
=
mpu
.
get_data_parallel_group
())
group
=
mpu
.
get_data_parallel_group
())
else
:
else
:
rng_state_list
=
[
rng_state
]
rng_state_list
=
[
rng_state
]
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment