Unverified Commit 76bced54 authored by Kirthi Shankar Sivamani's avatar Kirthi Shankar Sivamani Committed by GitHub
Browse files

`NVFP4BlockScaling` recipe docs (#2241)



* Improve docstring for NVFP4 recipe
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Add NVFP4BlockScaling to recipe docs
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Grammar
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* improve wording
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update transformer_engine/common/recipe/__init__.py
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update transformer_engine/common/recipe/__init__.py
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update transformer_engine/common/recipe/__init__.py
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update transformer_engine/common/recipe/__init__.py
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update transformer_engine/common/recipe/__init__.py
Co-authored-by: default avatarCopilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: default avatarPrzemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: default avatarCopilot <175728472+Copilot@users.noreply.github.com>
parent 127b6d3a
...@@ -12,6 +12,8 @@ Common API ...@@ -12,6 +12,8 @@ Common API
.. autoapiclass:: transformer_engine.common.recipe.MXFP8BlockScaling(fp8_format=Format.E4M3) .. autoapiclass:: transformer_engine.common.recipe.MXFP8BlockScaling(fp8_format=Format.E4M3)
.. autoapiclass:: transformer_engine.common.recipe.NVFP4BlockScaling(fp4_format=Format.E2M1)
.. autoapiclass:: transformer_engine.common.recipe.Float8CurrentScaling(fp8_format=Format.HYBRID) .. autoapiclass:: transformer_engine.common.recipe.Float8CurrentScaling(fp8_format=Format.HYBRID)
.. autoapiclass:: transformer_engine.common.recipe.Float8BlockScaling(fp8_format=Format.E4M3) .. autoapiclass:: transformer_engine.common.recipe.Float8BlockScaling(fp8_format=Format.E4M3)
...@@ -401,16 +401,32 @@ class NVFP4BlockScaling(Recipe): ...@@ -401,16 +401,32 @@ class NVFP4BlockScaling(Recipe):
computed from the high precision input to avoid double quantization computed from the high precision input to avoid double quantization
errors. errors.
The default NVFP4 training recipe implements 3 techniques for quantizing
to a narrow format (4-bit):
- For weight tensors a variant of the NVFP4 quantization is used,
where a single scaling factor is shared by a 2D block of 16x16 elements.
- When quantizing gradients, stochastic rounding is applied to avoid the bias
introduced by quantization. With this, values are rounded probabilistically
to one of their two nearest representable numbers, with probabilities
inversely proportional to their distances.
- When quantizing inputs and gradients, random Hadamard transforms are applied
(16x16 Hadamard matrix) to smooth outliers in the tensor distributions
and make them easier to represent accurately in NVFP4.
These techniques are described more comprehensively in the NVFP4 paper titled
'Pretraining Large Language Models with NVFP4' (https://arxiv.org/abs/2509.25149v1).
Parameters Parameters
---------- ----------
fp4_format : {Format.E2M1}, default = Format.E2M1 fp4_format : {Format.E2M1}, default = Format.E2M1
FP4 data type. FP4 data type.
fp8_format : {Format.E4M3}, default = Format.E4M3 disable_rht : bool, default = `False`
FP8 data type. Only E4M3 is supported. If set to `True`, random Hadamard transforms are not applied to any tensor.
fp8_dpa: bool, default = `False` disable_stochastic_rounding : bool, default = `False`
FP8 dot product attention. Not yet supported. If set to `True`, stochastic rounding is disabled during quantization for all tensors.
fp8_mha: bool, default = `False` disable_2d_quantization : bool, default = `False`
FP8 multi-head attention. Not yet supported. If set to `True`, 1D block scaling with block size 16 is used for all tensors.
""" """
# Configuration envvars # Configuration envvars
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment