`NVFP4BlockScaling` recipe docs (#2241)

* Improve docstring for NVFP4 recipe Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add NVFP4BlockScaling to recipe docs Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Grammar Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * improve wording Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

`NVFP4BlockScaling` recipe docs (#2241)
* Improve docstring for NVFP4 recipe Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add NVFP4BlockScaling to recipe docs Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Grammar Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * improve wording Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/recipe/__init__.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
76bced54 · Kirthi Shankar Sivamani · GitHub · 127b6d3a · 76bced54 · 76bced54
Unverified Commit 76bced54 authored Oct 07, 2025 by Kirthi Shankar Sivamani Committed by GitHub Oct 07, 2025
Show whitespace changes
Inline Side-by-side

Showing with 24 additions and 6 deletions

docs/api/common.rst docs/api/common.rst +2 -0

transformer_engine/common/recipe/__init__.py transformer_engine/common/recipe/__init__.py +22 -6

No files found.
--- a/docs/api/common.rst
+++ b/docs/api/common.rst
@@ -12,6 +12,8 @@ Common API
 .. autoapiclass:: transformer_engine.common.recipe.MXFP8BlockScaling(fp8_format=Format.E4M3)
+.. autoapiclass:: transformer_engine.common.recipe.NVFP4BlockScaling(fp4_format=Format.E2M1)
 .. autoapiclass:: transformer_engine.common.recipe.Float8CurrentScaling(fp8_format=Format.HYBRID)
 .. autoapiclass:: transformer_engine.common.recipe.Float8BlockScaling(fp8_format=Format.E4M3)
--- a/transformer_engine/common/recipe/__init__.py
+++ b/transformer_engine/common/recipe/__init__.py
@@ -401,16 +401,32 @@ class NVFP4BlockScaling(Recipe):
    computed from the high precision input to avoid double quantization
    errors.
+    The default NVFP4 training recipe implements 3 techniques for quantizing
+    to a narrow format (4-bit):
+    - For weight tensors a variant of the NVFP4 quantization is used,
+      where a single scaling factor is shared by a 2D block of 16x16 elements.
+    - When quantizing gradients, stochastic rounding is applied to avoid the bias
+      introduced by quantization. With this, values are rounded probabilistically
+      to one of their two nearest representable numbers, with probabilities
+      inversely proportional to their distances.
+    - When quantizing inputs and gradients, random Hadamard transforms are applied
+      (16x16 Hadamard matrix) to smooth outliers in the tensor distributions
+      and make them easier to represent accurately in NVFP4.
+    These techniques are described more comprehensively in the NVFP4 paper titled
+    'Pretraining Large Language Models with NVFP4' (https://arxiv.org/abs/2509.25149v1).
    Parameters
    ----------
    fp4_format : {Format.E2M1}, default = Format.E2M1
             FP4 data type.
-    fp8_format : {Format.E4M3}, default = Format.E4M3
+    disable_rht : bool, default = `False`
-             FP8 data type. Only E4M3 is supported.
+             If set to `True`, random Hadamard transforms are not applied to any tensor.
-    fp8_dpa: bool, default = `False`
+    disable_stochastic_rounding : bool, default = `False`
-             FP8 dot product attention. Not yet supported.
+             If set to `True`, stochastic rounding is disabled during quantization for all tensors.
-    fp8_mha: bool, default = `False`
+    disable_2d_quantization : bool, default = `False`
-             FP8 multi-head attention. Not yet supported.
+             If set to `True`, 1D block scaling with block size 16 is used for all tensors.
    """
    # Configuration envvars