Add cuBLASMp-backed GEMM-like API to TE common (#1824)
* Pick up cuBLASMp during build Signed-off-by:Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Change lib order to fix link error Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Context creation, incomplete... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Test fixure Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * A sanity AgGemm test, failing... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix axes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Take care of uneven distribution Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use MPI to get position of local matrices Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Refactor Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Refactor & fixes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Gemm-RS Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Gemm-AR, not working... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fixes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Setting all-reduce epilogue for gemm-ar Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use supported shapes for GEMM-AR Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak tolerance Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * First shot at fp8 Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use TensorHolder in tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More test configs Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Support comm_sm_count Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Parametrize dtypes for A, B and D separately Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak scaling Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Amax ptr Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Flags parity with cublas_gemm, saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Cleanup Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Bias tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix bias test Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Aux, saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * aux_ld Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * A fix Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use test::Tensor Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Set scale inv Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove unsupported test configs Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Replace libcal with NCCL Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Add NVTX markers to API functions Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak GemmAr tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More test config Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix merge fallout Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove MPI dependency, comment API, add algo parameter Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix nvshmem dependency Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix nvshmem build Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Excluse CommGemm tests from L0_cppunittest Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Add cpp_distributed sh file for CI Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Adapt tp TensorAllocator Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skip GemmAr test on unsupported HW Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Oversibscribe is needed on some clusters Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix incomplete libcal removal Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Move CI tests to L1 Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Rename context to include NVTE prefix Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove leftover code Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * NVTE_WITH_CUBLASMP off by default Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More detailed NVTE_CHECK diag Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Comment API Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Include stdbool header for legacy C compilers Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove now unused argument Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Abstract away cuBLASMp algo behind our own enum Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More detailed shape diag messages Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/include/transformer_engine/comm_gemm.h Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Vladimir Cherepanov <56651474+mk-61@users.noreply.github.com> * Add license Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> --------- Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> Signed-off-by:
Vladimir Cherepanov <56651474+mk-61@users.noreply.github.com> Co-authored-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
Showing
Please register or sign in to comment