Fix a crash in NeMo 2.0 during module._apply(lambda t: t.cpu()) (#1502)

* Fix a crash with module._apply(lambda t: t.cpu()) Signed-off-by: Guyue Huang <guyueh@nvidia.com> * Add comments Signed-off-by: Guyue Huang <guyueh@nvidia.com> * Make sure tensor is moved to dst device before quantizer quantizes Signed-off-by: Guyue Huang <guyueh@nvidia.com> --------- Signed-off-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

Fix a crash in NeMo 2.0 during module._apply(lambda t: t.cpu()) (#1502)
* Fix a crash with module._apply(lambda t: t.cpu()) Signed-off-by: Guyue Huang <guyueh@nvidia.com> * Add comments Signed-off-by: Guyue Huang <guyueh@nvidia.com> * Make sure tensor is moved to dst device before quantizer quantizes Signed-off-by: Guyue Huang <guyueh@nvidia.com> --------- Signed-off-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
9351a179 · guyueh1 · GitHub · 87441885 · 9351a179 · 9351a179
Unverified Commit 9351a179 authored Feb 25, 2025 by guyueh1 Committed by GitHub Feb 25, 2025
Showing with 4 additions and 0 deletions

transformer_engine/pytorch/tensor/float8_tensor.py transformer_engine/pytorch/tensor/float8_tensor.py +2 -0

transformer_engine/pytorch/tensor/mxfp8_tensor.py transformer_engine/pytorch/tensor/mxfp8_tensor.py +2 -0

No files found.
--- a/transformer_engine/pytorch/tensor/float8_tensor.py
+++ b/transformer_engine/pytorch/tensor/float8_tensor.py
@@ -484,6 +484,8 @@ class Float8Tensor(Float8TensorBase, QuantizedTensor):
        # Tensor device
        new_device = tensor.device if tensor.is_cuda else self.device
+        if not devices_match(new_device, tensor.device):
+            tensor = tensor.to(device=new_device)
        # Just copy FP8 data if other tensor is Float8Tensor
        if isinstance(tensor, Float8Tensor):

--- a/transformer_engine/pytorch/tensor/mxfp8_tensor.py
+++ b/transformer_engine/pytorch/tensor/mxfp8_tensor.py
@@ -368,6 +368,8 @@ class MXFP8Tensor(MXFP8TensorBase, QuantizedTensor):
        # Tensor device
        new_device = tensor.device if tensor.is_cuda else self.device
+        if not devices_match(new_device, tensor.device):
+            tensor = tensor.to(device=new_device)
        # Just copy FP8 data if other tensor is MXFP8Tensor
        if isinstance(tensor, MXFP8Tensor):