[Pytorch] Add Cutlass Grouped GEMM Support for fine-grained MoE Model (#2045)
* feat: add cutlass group gemm support Signed-off-by:Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor: refactor multi tensor gemm interface Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor: refactor nvte_multi_stream_cublas_gemm func and add license info Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: add unit test for cutlass group gemm Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: add cutlass support type protect Signed-off-by:
Min Yang <min.yang@shopee.com> * add tests and fix lint Signed-off-by:
Xin Yao <xiny@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: fix unit tests error Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: refactor host workspace malloc Signed-off-by:
Min Yang <min.yang@shopee.com> * update cutlass Signed-off-by:
Xin Yao <xiny@nvidia.com> * update cutlass Signed-off-by:
Xin Yao <xiny@nvidia.com> * further relex threshold and add a env var to warn fall back Signed-off-by:
Xin Yao <xiny@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Min Yang <min.yang@shopee.com> Signed-off-by:
Xin Yao <xiny@nvidia.com> Signed-off-by:
alan yang <89962857+cassiewilliam@users.noreply.github.com> Co-authored-by:
Min Yang <min.yang@shopee.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
Showing
Please register or sign in to comment