-
Tim Moon authored
* Delete row-wise data in single-GPU linear forward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Python->C++ parsing of transpose-only Float8Tensors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug tensor shape calculation without row-wise data Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug correctness issues with only column-wise data Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Only cache column-wise input in LayerNormLinear Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support MXFP8 all-gather with only column-wise data Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix moe cases, lint, rm unused ctx Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CPU activation offloading and use consistent logic for save/restore Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix typo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * RM stray file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix distributed and cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix norm cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Rm stray file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * RM stray file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix MXFP8 AG Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix FP8 with sequence parallelism Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix UB bulk dgrad Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
8a20d666