"git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "1055175a1896fc7592bd506b5d8d562fafd61a01"
Commit 1b3dfa2f authored by Deepak Narayanan's avatar Deepak Narayanan Committed by Jared Casper
Browse files

Use torch.cuda.synchronize() right after calling batch_isend_irecv() communication API

parent be473a5b
...@@ -347,6 +347,8 @@ def communicate(tensor_send_next, tensor_send_prev, recv_forward, recv_backward) ...@@ -347,6 +347,8 @@ def communicate(tensor_send_next, tensor_send_prev, recv_forward, recv_backward)
reqs = torch.distributed.batch_isend_irecv(ops) reqs = torch.distributed.batch_isend_irecv(ops)
for req in reqs: for req in reqs:
req.wait() req.wait()
# Temporary workaround for batch_isend_irecv() race condition.
torch.cuda.synchronize()
return tensor_recv_prev, tensor_recv_next return tensor_recv_prev, tensor_recv_next
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment