-
Jan Bielak authored
* Update to_string(NVTEScalingMode) to include block scaling Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add `nvte_swizzle_block_scaling_to_mxfp8_scaling_factors` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Convert FP8 block scaling tensors to MXFP8 tensors on Blackwell and newer in GEMM Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Allow Blackwell and newer in Deepseek recipe compatbility check Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Allow data_rows % 4 != 0 in 1d kernel Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Load scaling factors in unswizzled order in 1d kernel Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Enforce use of power of two scaling Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Skip the FP8 block scaling exact GEMM test on Blackwell Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Skip further tests with pow_2_scales=False Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial implementation of tensor conversion for grouped gemm Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Skip non power of two scaling cpp unit tests Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Fix handling of all gather Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from code review Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Use compute capability 10.0 for logic with Blackwell Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Apply suggestions from code review Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Jan Bielak <jbielak@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
dfe5b7df