-
vasunvidia authored
* Initial commit Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Repro for RS output mismatch with Single GEMM + Split pipelined RS Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * minor changes for AG->GEMM pipelined overlap Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add Atomic Gemm cublasApi attributes and initial implementation of AG->Atomic GEMM Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * AtomicGemm+RS functional with workaround Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * add amax update to layernorm_linear for FP8 unit test accuracy Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Enable reducescatter2_userbuff_strided variants Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * AG+AtomicGemm overlap functional but gemm doesnt overlap with comm Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add userbuffers_sendrecv kernel variants Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * TransformerLayer API changes to enable AtomicGemm+RS overlap Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Code cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Code cleanup2 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [UB] AllGather Atomic GEMM overlap using userbuffer_sendrecv kernels Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Code cleanup + bug fix for multiatomic sendrecv kernel Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fixes Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [UB] Add shuffling for better AG AtomicGEMM overlap Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix for AG AtomicGemm overlap Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix for multiAtomicAG and singleAtomicAG Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Use chunk_i+1 as recv_chunk for multiatomic_AG with shuffling Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Launch AtomicGEMM after first-chunk AG Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Rebase to main Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add FP8 ReduceScatter kernels, AtomicGEMM+FP8 RS not functional Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Revert "Add FP8 ReduceScatter kernels, AtomicGEMM+FP8 RS not functional" This reverts commit 80a47a76355440cd5fb4314c96fe9fda632d87f9. Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add support for NVLS-MC and FP8 Reduce Scatter Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Atomic and Multiatomic FP8 RS functional Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Remove debug print Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * UB comm initialization hang fix Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Code cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Create new GEMM API for Atomic GEMM Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * CI ready Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * more fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * license Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fix Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Revert NVLS-MC Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Check cu* versions for running atomic gemms Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add experimental warning Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better wording Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add warning to c api Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix wording Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
958e1889