- 27 Sep, 2025 1 commit
-
-
Phuong Nguyen authored
* init cgemm + unit tests * UB bootstrap with NCCL, no MPI dependency * add NVLINK-P2P check + error message * skip tests if no NVLINK available * use std::vector to store ncclComm_t * update misuse of TP warning Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 26 Aug, 2025 1 commit
-
-
Vladimir Cherepanov authored
* Pick up cuBLASMp during build Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Change lib order to fix link error Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Context creation, incomplete... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Test fixure Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * A sanity AgGemm test, failing... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix axes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Take care of uneven distribution Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use MPI to get position of local matrices Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Refactor Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Refactor & fixes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Gemm-RS Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Gemm-AR, not working... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fixes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Setting all-reduce epilogue for gemm-ar Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use supported shapes for GEMM-AR Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak tolerance Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * First shot at fp8 Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use TensorHolder in tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More test configs Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Support comm_sm_count Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Parametrize dtypes for A, B and D separately Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak scaling Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Amax ptr Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Flags parity with cublas_gemm, saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Cleanup Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Bias tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix bias test Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Aux, saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * aux_ld Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * A fix Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use test::Tensor Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Set scale inv Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove unsupported test configs Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Replace libcal with NCCL Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Add NVTX markers to API functions Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak GemmAr tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More test config Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix merge fallout Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove MPI dependency, comment API, add algo parameter Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix nvshmem dependency Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix nvshmem build Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Excluse CommGemm tests from L0_cppunittest Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Add cpp_distributed sh file for CI Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Adapt tp TensorAllocator Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skip GemmAr test on unsupported HW Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Oversibscribe is needed on some clusters Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix incomplete libcal removal Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Move CI tests to L1 Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Rename context to include NVTE prefix Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove leftover code Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * NVTE_WITH_CUBLASMP off by default Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More detailed NVTE_CHECK diag Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Comment API Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Include stdbool header for legacy C compilers Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove now unused argument Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Abstract away cuBLASMp algo behind our own enum Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More detailed shape diag messages Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/include/transformer_engine/comm_gemm.h Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Vladimir Cherepanov <56651474+mk-61@users.noreply.github.com> * Add license Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> --------- Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> Signed-off-by:
Vladimir Cherepanov <56651474+mk-61@users.noreply.github.com> Co-authored-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
- 07 May, 2025 1 commit
-
-
Tim Moon authored
* Initial work toward restoring UB support in te.Sequential Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Forward UB linear runs, but has numerical error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug UB forward tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor tweaks Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove Python checks for MXFP8 UB linear forward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add dim check for MXFP8 full tiles Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move QuantizedTensor logic out of UB comm and into Python helper function Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support MXFP8 AGs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Coalesce NCCL all-gathers for MXFP8 all-gather Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial impl of backward UB linear in te.Sequential Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug UB linear backward with no quantization dgrad GEMM + dx RS is still broken. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix chunk dims for dgrad GEMM + dx RS Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debugging MXFP8 UB cases Still failing with dy AG + wgrad GEMM Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use NCCL to overlap dy AG with dgrad GEMM Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug UB GEMM tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial refactoring of linear module forward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor linear module backward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug linear module UB tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak test tensor dims Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Do not store autograd context within wgrad GEMM closure Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update LayerNormLinear Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update LayerNormMLP Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug UB tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor style tweaks Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix incorrect usage for GEMM input with block-scaled FP8 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix RS out dims Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable dgrad GEMM + UB AG + NCCL AG overlapping Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Disable dgrad GEMM + UB AG + NCCL AG overlap in te.Sequential Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore support for internal quantized tensors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tests for MXFP8 GEMM with UB Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Apr, 2025 1 commit
-
-
yuguo authored
-
- 20 Mar, 2025 1 commit
-
-
yuguo authored
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 13 Dec, 2023 1 commit
-
-
cyanguwa authored
* fix backend selection for sm80 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix compiling warnings in sdpa flash Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add nvte error messages Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add NVTE_CHECK_CUDNN_FE for error messaging Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable pylint bare-except Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 24 Oct, 2023 1 commit
-
-
Tim Moon authored
* Do not include logging macros in installed C headers Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug logging macros Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug C++ tests Use Google style for header includes. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update CUDA driver macros Incorporating changes from #389. Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use core error checking macros in PyTorch extensions Hack to get around macro redefinition warning. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix missing arg when getting CUDA driver error string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reuse logging header in frameworks Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com>
-