Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
7ffea978
Commit
7ffea978
authored
Feb 05, 2021
by
Deepak Narayanan
Browse files
Use torch.cuda.synchronize() right after calling batch_isend_irecv() communication API
parent
2096d356
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
0 deletions
+2
-0
megatron/training.py
megatron/training.py
+2
-0
No files found.
megatron/training.py
View file @
7ffea978
...
@@ -351,6 +351,8 @@ def communicate(tensor_send_next, tensor_send_prev, recv_forward, recv_backward)
...
@@ -351,6 +351,8 @@ def communicate(tensor_send_next, tensor_send_prev, recv_forward, recv_backward)
reqs
=
torch
.
distributed
.
batch_isend_irecv
(
ops
)
reqs
=
torch
.
distributed
.
batch_isend_irecv
(
ops
)
for
req
in
reqs
:
for
req
in
reqs
:
req
.
wait
()
req
.
wait
()
# Temporary workaround for batch_isend_irecv() race condition.
torch
.
cuda
.
synchronize
()
return
tensor_recv_prev
,
tensor_recv_next
return
tensor_recv_prev
,
tensor_recv_next
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment