Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
1b3dfa2f
"docs/vscode:/vscode.git/clone" did not exist on "ade312c4e68adfce2e30e1e19861a76f4c80f1a1"
Commit
1b3dfa2f
authored
Feb 05, 2021
by
Deepak Narayanan
Committed by
Jared Casper
Feb 05, 2021
Browse files
Use torch.cuda.synchronize() right after calling batch_isend_irecv() communication API
parent
be473a5b
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
0 deletions
+2
-0
megatron/training.py
megatron/training.py
+2
-0
No files found.
megatron/training.py
View file @
1b3dfa2f
...
...
@@ -347,6 +347,8 @@ def communicate(tensor_send_next, tensor_send_prev, recv_forward, recv_backward)
reqs
=
torch
.
distributed
.
batch_isend_irecv
(
ops
)
for
req
in
reqs
:
req
.
wait
()
# Temporary workaround for batch_isend_irecv() race condition.
torch
.
cuda
.
synchronize
()
return
tensor_recv_prev
,
tensor_recv_next
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment