Allocate tensor in `communicate()` method directly on GPU (instead of...

Allocate tensor in `communicate()` method directly on GPU (instead of allocating on CPU and then moving to GPU)

Allocate tensor in `communicate()` method directly on GPU (instead of...
Allocate tensor in `communicate()` method directly on GPU (instead of allocating on CPU and then moving to GPU)
9ff6f473 · mshoeybi · Deepak Narayanan · fdcaeba0 · 9ff6f473
Commit 9ff6f473 authored Nov 11, 2020 by mshoeybi Committed by Deepak Narayanan Nov 12, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

megatron/training.py megatron/training.py +4 -2

No files found.
--- a/megatron/training.py
+++ b/megatron/training.py
@@ -249,11 +249,13 @@ def communicate(tensor_send_next, tensor_send_prev, recv_forward, recv_backward)
    if recv_forward:
        tensor_recv_prev = torch.empty(tensor_shape,
                                       requires_grad=True,
-                                       dtype=args.params_dtype).cuda()
+                                       device=torch.cuda.current_device(),
+                                       dtype=args.params_dtype)
    if recv_backward:
        tensor_recv_next = torch.empty(tensor_shape,
                                       requires_grad=True,
-                                       dtype=args.params_dtype).cuda()
+                                       device=torch.cuda.current_device(),
+                                       dtype=args.params_dtype)

    # Send tensors in both the forward and backward directions as appropriate.
    torch.distributed.ring_exchange(tensor_send_prev=tensor_send_prev,