Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
cbad126a
Commit
cbad126a
authored
Mar 03, 2021
by
Deepak Narayanan
Browse files
Bring back call to ring_exchange() in _communicate()
parent
78cf869f
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
36 additions
and
26 deletions
+36
-26
megatron/p2p_communication.py
megatron/p2p_communication.py
+36
-26
No files found.
megatron/p2p_communication.py
View file @
cbad126a
...
@@ -21,7 +21,8 @@ from megatron import get_args
...
@@ -21,7 +21,8 @@ from megatron import get_args
from
megatron
import
mpu
from
megatron
import
mpu
def
_communicate
(
tensor_send_next
,
tensor_send_prev
,
recv_prev
,
recv_next
):
def
_communicate
(
tensor_send_next
,
tensor_send_prev
,
recv_prev
,
recv_next
,
use_ring_exchange
=
False
):
"""Communicate tensors between stages. Used as helper method in other
"""Communicate tensors between stages. Used as helper method in other
communication methods that are used in megatron/schedules.py.
communication methods that are used in megatron/schedules.py.
...
@@ -34,6 +35,8 @@ def _communicate(tensor_send_next, tensor_send_prev, recv_prev, recv_next):
...
@@ -34,6 +35,8 @@ def _communicate(tensor_send_next, tensor_send_prev, recv_prev, recv_next):
previous rank.
previous rank.
recv_next: boolean for whether tensor should be received from
recv_next: boolean for whether tensor should be received from
next rank.
next rank.
use_ring_exchange: boolean for whether torch.distributed.ring_exchange()
API should be used.
Returns:
Returns:
(tensor_recv_prev, tensor_recv_next)
(tensor_recv_prev, tensor_recv_next)
...
@@ -73,6 +76,13 @@ def _communicate(tensor_send_next, tensor_send_prev, recv_prev, recv_next):
...
@@ -73,6 +76,13 @@ def _communicate(tensor_send_next, tensor_send_prev, recv_prev, recv_next):
tensor_send_prev
=
mpu
.
split_tensor_into_1d_equal_chunks
(
tensor_send_prev
)
tensor_send_prev
=
mpu
.
split_tensor_into_1d_equal_chunks
(
tensor_send_prev
)
# Send tensors in both the forward and backward directions as appropriate.
# Send tensors in both the forward and backward directions as appropriate.
if
use_ring_exchange
:
torch
.
distributed
.
ring_exchange
(
tensor_send_prev
=
tensor_send_prev
,
tensor_recv_prev
=
tensor_recv_prev
,
tensor_send_next
=
tensor_send_next
,
tensor_recv_next
=
tensor_recv_next
,
group
=
mpu
.
get_pipeline_model_parallel_group
())
else
:
ops
=
[]
ops
=
[]
if
tensor_send_prev
is
not
None
:
if
tensor_send_prev
is
not
None
:
send_prev_op
=
torch
.
distributed
.
P2POp
(
send_prev_op
=
torch
.
distributed
.
P2POp
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment