[PyTorch] Get `skip_fp8_weight_update` only in CUDA Graph Capturing (#1854)

only get skip_fp8_weight_update in fp8_graph_capturing Signed-off-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

[PyTorch] Get `skip_fp8_weight_update` only in CUDA Graph Capturing (#1854)
only get skip_fp8_weight_update in fp8_graph_capturing Signed-off-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
beffb297 · Xin Yao · GitHub · 05f3b573 · beffb297
Unverified Commit beffb297 authored Jun 07, 2025 by Xin Yao Committed by GitHub Jun 06, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

transformer_engine/pytorch/module/grouped_linear.py transformer_engine/pytorch/module/grouped_linear.py +4 -1

No files found.
--- a/transformer_engine/pytorch/module/grouped_linear.py
+++ b/transformer_engine/pytorch/module/grouped_linear.py
@@ -668,7 +668,10 @@ class GroupedLinear(TransformerEngineBaseModule):
        ), "GroupedLinear doesn't support input tensor in FP8."
        assert len(m_splits) == self.num_gemms, "Number of splits should match number of GEMMs."

+        if FP8GlobalStateManager.fp8_graph_capturing():
            skip_fp8_weight_update = FP8GlobalStateManager.get_skip_fp8_weight_update_tensor()
+        else:
+            skip_fp8_weight_update = None
        if skip_fp8_weight_update is not None:
            is_first_microbatch = False