Add logic for block-scaled tensors with GEMM swizzled scales (#2486)
* Add general C API for setting tensor params Signed-off-by:Tim Moon <tmoon@nvidia.com> * Implement general accessors for NVTETensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor tex swizzling to skip if scales are already swizzled Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add checks for non-swizzled scales in MXFP8 and NVFP4 kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support pre-swizzled scales in MXFP8Tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tex function to swizzle MXFP8 scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in inplace swizzle function Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak comments to use "compact/swizzled format" Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MXFP8 quantize kernel with pre-swizzled scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Expose pre-swizzled scales in modules Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in multi-swizzle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support MXFP8 gated activations with swizzled scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add PyTorch infrastructure for pre-swizzled NVFP4 tensors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Deprecate DSv3-specific quantization logic in C API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove support for DSv3 compact data from quantizer Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove DSv3 compact data format from core lib Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in FP8 all-gather Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update JAX to use new swizzled scale API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestion from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update C++ swizzle test with swizzled scales API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Return default tensor params when querying params for invalid NVTETensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug DSv3 FP8 test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Userbuffers test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure gated activations populate FP8 transpose if needed Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable pre-swizzling with debug quantizer Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestion from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix merge conflicts and review suggestions Update copyright years. Tweak comments. Fix various complaints from @greptile-apps. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use explicitly sized types in config accessors Miscellaneous review suggestions from @ptrendx. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make util header for function that compute swizzled scale index Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from @greptile-apps Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Update expected error message in FP8 block-scaling test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestion from @yaox12 Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment