Update staging branch. (#706)

* update daily build from rocm 5.4.3 to 5.5 (#693) * Fix grouped_gemm_splitk kernels on MI300. (#694) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes --------- Co-authored-by: Jing Zhang <jizhan@amd.com> * Fix the group of quantization_int8 kernels on MI300. (#695) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes * fix the group of kernels from ticket 723 on MI300 --------- Co-authored-by: Jing Zhang <jizhan@amd.com> * Optimize bf16 conversion (#664) * Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include * Normalization/split k (#615) * Add contraction profiler and tests (#701) * Add contraction profiler and tests * Build and style fixes * Allow to use any elementwise operator for ref_contraction * Introduce profile_contraction_scale and profile_contraction_bilinear * Make ref_contraction generic and extend interface tests * Stylistic minor fixes * Extend test_contraction_interface --------- Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

Update staging branch. (#706)
* update daily build from rocm 5.4.3 to 5.5 (#693) * Fix grouped_gemm_splitk kernels on MI300. (#694) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes --------- Co-authored-by: Jing Zhang <jizhan@amd.com> * Fix the group of quantization_int8 kernels on MI300. (#695) * replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes * fix the group of kernels from ticket 723 on MI300 --------- Co-authored-by: Jing Zhang <jizhan@amd.com> * Optimize bf16 conversion (#664) * Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include * Normalization/split k (#615) * Add contraction profiler and tests (#701) * Add contraction profiler and tests * Build and style fixes * Allow to use any elementwise operator for ref_contraction * Introduce profile_contraction_scale and profile_contraction_bilinear * Make ref_contraction generic and extend interface tests * Stylistic minor fixes * Extend test_contraction_interface --------- Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>
72b7ae25 · Illia Silin · GitHub · bbe74503 · 72b7ae25 · 72b7ae25
Unverified Commit 72b7ae25 authored May 15, 2023 by Illia Silin Committed by GitHub May 15, 2023
5 changed files
--- a/profiler/src/profile_contraction_scale.cpp
+++ b/profiler/src/profile_contraction_scale.cpp
+// SPDX-License-Identifier: MIT
+// Copyright (c) 2023, Advanced Micro Devices, Inc. All rights reserved.
+#include <iostream>
+#include <numeric>
+#include <initializer_list>
+#include <cstdlib>
+#include <vector>
+#include "profiler/profile_contraction_impl.hpp"
+#include "profiler/profile_contraction_utils.hpp"
+#include "profiler_operation_registry.hpp"
+#define OP_NAME "contraction_scale"
+#define OP_DESC "CONTRACTION+Scale"
+static void print_helper_msg()
+{
+    std::cout << "arg1: tensor operation (" OP_NAME ": " OP_DESC ")\n"
+              << "arg2: data type (0: fp32; 1: f64)\n"
+              << "arg3: matrix layout (0: A[m0, m1, k0, k1] * B[k0, k1, n0, n1] + "
+                 "D[m0, m1, n0, n1] = E[m0, m1, n0, n1];\n"
+              << "                     1: A[m0, m1, k0, k1] * B[n0, n1, k0, k1] + "
+                 "D[m0, m1, n0, n1] = E[m0, m1, n0, n1];\n"
+              << "                     2: A[k0, k1, m0, m1] * B[k0, k1, n0, n1] + "
+                 "D[m0, m1, n0, n1] = E[m0, m1, n0, n1];\n"
+              << "                     3: A[k0, k1, m0, m1] * B[n0, n1, k0, k1] + "
+                 "D[m0, m1, n0, n1] = E[m0, m1, n0, n1])\n"
+              << "arg4: verification (0: no; 1: yes)\n"
+              << "arg5: initialization (0: no init; 1: integer value; 2: decimal "
+              << "value)\n"
+              << "arg6: print tensor value (0: no; 1: yes)\n"
+              << "arg7: time kernel (0: no, 1: yes)\n"
+              << "arg8: alpha\n"
+              << "arg9 to 14: M0, M1, N0, N1, K0, K1\n"
+              << "arg15 to 30: Strides for A, B, D and E (skip for default)\n"
+              << std::endl;
+}
+int profile_contraction_scale(int argc, char* argv[])
+{
+    const bool default_strides = argc == 15;
+    if(argc != 31 && argc != 15)
+    {
+        print_helper_msg();
+        exit(1);
+    }
+    const auto data_type          = static_cast<ContractionDataType>(std::stoi(argv[2]));
+    const auto layout             = static_cast<ContractionMatrixLayout>(std::stoi(argv[3]));
+    const bool do_verification    = std::stoi(argv[4]);
+    const ck::index_t init_method = std::stoi(argv[5]);
+    const bool do_log             = std::stoi(argv[6]);
+    const bool time_kernel        = std::stoi(argv[7]);
+    const float alpha             = std::stof(argv[8]);
+    std::vector<ck::index_t> M;
+    std::vector<ck::index_t> N;
+    std::vector<ck::index_t> K;
+    const ck::index_t dims_arg_num = 9;
+    collect_index_params(argv, M, dims_arg_num, 2);
+    collect_index_params(argv, N, dims_arg_num + 2, 2);
+    collect_index_params(argv, K, dims_arg_num + 4, 2);
+    std::vector<ck::index_t> StridesA;
+    std::vector<ck::index_t> StridesB;
+    std::vector<ck::index_t> StridesE;
+    std::vector<ck::index_t> StridesD;
+    if(!default_strides)
+    {
+        collect_index_params(argv, StridesA, dims_arg_num + 6, 4);
+        collect_index_params(argv, StridesB, dims_arg_num + 10, 4);
+        collect_index_params(argv, StridesE, dims_arg_num + 14, 4);
+        collect_index_params(argv, StridesD, dims_arg_num + 18, 4);
+    }
+    using F32 = float;
+    using F64 = double;
+    auto profile = [&](auto a_layout, auto b_layout, auto cde_layout, auto type) {
+        using ALayout   = decltype(a_layout);
+        using BLayout   = decltype(b_layout);
+        using CDELayout = decltype(cde_layout);
+        using DataType = decltype(type);
+        if(default_strides)
+        {
+            assign_default_strides(a_layout, StridesA, {M[0], M[1], K[0], K[1]});
+            assign_default_strides(b_layout, StridesB, {K[0], K[1], N[0], N[1]});
+            assign_default_strides(cde_layout, StridesE, {M[0], M[1], N[0], N[1]});
+            assign_default_strides(cde_layout, StridesD, {M[0], M[1], N[0], N[1]});
+        }
+        bool pass = ck::profiler::
+            profile_contraction_impl<ALayout, BLayout, CDELayout, DataType, ck::Tuple<>, Scale>(
+                do_verification,
+                init_method,
+                do_log,
+                time_kernel,
+                Scale{alpha},
+                M,
+                N,
+                K,
+                StridesA,
+                StridesB,
+                StridesE,
+                StridesD);
+        return pass;
+    };
+    if(data_type == ContractionDataType::F32_F32_F32_F32 &&
+       layout == ContractionMatrixLayout::MK_KN_MN_MN)
+    {
+        return profile(Row{}, Row{}, Row{}, F32{});
+    }
+    else if(data_type == ContractionDataType::F32_F32_F32_F32 &&
+            layout == ContractionMatrixLayout::MK_NK_MN_MN)
+    {
+        return profile(Row{}, Col{}, Row{}, F32{});
+    }
+    else if(data_type == ContractionDataType::F32_F32_F32_F32 &&
+            layout == ContractionMatrixLayout::KM_KN_MN_MN)
+    {
+        return profile(Col{}, Row{}, Row{}, F32{});
+    }
+    else if(data_type == ContractionDataType::F32_F32_F32_F32 &&
+            layout == ContractionMatrixLayout::KM_NK_MN_MN)
+    {
+        return profile(Col{}, Col{}, Row{}, F32{});
+    }
+    else if(data_type == ContractionDataType::F64_F64_F64_F64 &&
+            layout == ContractionMatrixLayout::MK_KN_MN_MN)
+    {
+        return profile(Row{}, Row{}, Row{}, F64{});
+    }
+    else if(data_type == ContractionDataType::F64_F64_F64_F64 &&
+            layout == ContractionMatrixLayout::MK_NK_MN_MN)
+    {
+        return profile(Row{}, Col{}, Row{}, F64{});
+    }
+    else if(data_type == ContractionDataType::F64_F64_F64_F64 &&
+            layout == ContractionMatrixLayout::KM_KN_MN_MN)
+    {
+        return profile(Col{}, Row{}, Row{}, F64{});
+    }
+    else if(data_type == ContractionDataType::F64_F64_F64_F64 &&
+            layout == ContractionMatrixLayout::KM_NK_MN_MN)
+    {
+        return profile(Col{}, Col{}, Row{}, F64{});
+    }
+    else
+    {
+        std::cout << "this data_type & layout is not implemented" << std::endl;
+        return 1;
+    }
+}
+REGISTER_PROFILER_OPERATION(OP_NAME, OP_DESC, profile_contraction_scale);
--- a/test/CMakeLists.txt
+++ b/test/CMakeLists.txt
@@ -56,6 +56,7 @@ add_subdirectory(normalization)
 add_subdirectory(data_type)
 add_subdirectory(elementwise_normalization)
 add_subdirectory(batchnorm)
+add_subdirectory(contraction)
 if(GPU_TARGETS MATCHES "gfx1100")
    add_subdirectory(wmma_op)
 endif()
--- a/test/contraction/CMakeLists.txt
+++ b/test/contraction/CMakeLists.txt
+add_gtest_executable(test_contraction test_contraction.cpp)
+add_gtest_executable(test_contraction_interface test_contraction_interface.cpp)
+target_link_libraries(test_contraction PRIVATE utility device_contraction_bilinear_instance device_contraction_scale_instance)
+target_link_libraries(test_contraction_interface PRIVATE utility device_contraction_bilinear_instance device_contraction_scale_instance)
--- a/test/contraction/test_contraction.cpp
+++ b/test/contraction/test_contraction.cpp
+// SPDX-License-Identifier: MIT
+// Copyright (c) 2023, Advanced Micro Devices, Inc. All rights reserved.
+#include <cstdlib>
+#include <iostream>
+#include <memory>
+#include <initializer_list>
+#include <vector>
+#include <tuple>
+#include <gtest/gtest.h>
+#include "profiler/profile_contraction_impl.hpp"
+using F32 = float;
+using F64 = double;
+using Row = ck::tensor_layout::gemm::RowMajor;
+using Col = ck::tensor_layout::gemm::ColumnMajor;
+using Bilinear = ck::tensor_operation::element_wise::Bilinear;
+using Scale    = ck::tensor_operation::element_wise::Scale;
+struct MemoryParams
+{
+    std::vector<ck::index_t> M;
+    std::vector<ck::index_t> N;
+    std::vector<ck::index_t> K;
+    std::vector<ck::index_t> StridesA;
+    std::vector<ck::index_t> StridesB;
+    std::vector<ck::index_t> StridesC;
+    std::vector<ck::index_t> StridesD;
+};
+template <typename Tuple>
+class TestContraction : public ::testing::Test
+{
+    protected:
+    using ALayout        = std::tuple_element_t<0, Tuple>;
+    using BLayout        = std::tuple_element_t<1, Tuple>;
+    using CDLayout       = std::tuple_element_t<2, Tuple>;
+    using DataType       = std::tuple_element_t<3, Tuple>;
+    using DTupleDataType = std::tuple_element_t<4, Tuple>;
+    using CDElementOp    = std::tuple_element_t<5, Tuple>;
+    std::vector<MemoryParams> list_of_memory_params = {{{32, 32},
+                                                        {32, 32},
+                                                        {32, 32},
+                                                        {32768, 1024, 32, 1},
+                                                        {32768, 1024, 32, 1},
+                                                        {32768, 1024, 32, 1},
+                                                        {32768, 1024, 32, 1}},
+                                                       {{16, 16},
+                                                        {32, 32},
+                                                        {16, 16},
+                                                        {4096, 256, 16, 1},
+                                                        {16, 1, 8192, 256},
+                                                        {16384, 1024, 32, 1},
+                                                        {16384, 1024, 32, 1}}};
+    std::vector<ck::index_t> init_methods = {0, 1, 2};
+    std::unique_ptr<CDElementOp> p_cd_element_op;
+    void Run()
+    {
+        for(auto& memory_params : list_of_memory_params)
+        {
+            for(const ck::index_t init_method : init_methods)
+            {
+                bool pass =
+                    ck::profiler::profile_contraction_impl<ALayout,
+                                                           BLayout,
+                                                           CDLayout,
+                                                           DataType,
+                                                           DTupleDataType,
+                                                           CDElementOp>(true /*do_verification*/,
+                                                                        init_method,
+                                                                        false /*do_logs*/,
+                                                                        false /*time_kernel*/,
+                                                                        *p_cd_element_op,
+                                                                        memory_params.M,
+                                                                        memory_params.N,
+                                                                        memory_params.K,
+                                                                        memory_params.StridesA,
+                                                                        memory_params.StridesB,
+                                                                        memory_params.StridesC,
+                                                                        memory_params.StridesD);
+                EXPECT_TRUE(pass);
+            }
+        }
+    }
+};
+template <typename Tuple>
+class TestContractionScale : public TestContraction<Tuple>
+{
+};
+template <typename Tuple>
+class TestContractionBilinear : public TestContraction<Tuple>
+{
+};
+using BilinearKernelTypes =
+    ::testing::Types<std::tuple<Row, Row, Row, F32, ck::Tuple<F32>, Bilinear>,
+                     std::tuple<Row, Col, Row, F32, ck::Tuple<F32>, Bilinear>,
+                     std::tuple<Col, Row, Row, F32, ck::Tuple<F32>, Bilinear>,
+                     std::tuple<Col, Col, Row, F32, ck::Tuple<F32>, Bilinear>,
+                     std::tuple<Row, Row, Row, F64, ck::Tuple<F32>, Bilinear>,
+                     std::tuple<Row, Col, Row, F64, ck::Tuple<F32>, Bilinear>,
+                     std::tuple<Col, Row, Row, F64, ck::Tuple<F32>, Bilinear>,
+                     std::tuple<Col, Col, Row, F64, ck::Tuple<F32>, Bilinear>>;
+using ScaleKernelTypes = ::testing::Types<std::tuple<Row, Row, Row, F32, ck::Tuple<>, Scale>,
+                                          std::tuple<Row, Col, Row, F32, ck::Tuple<>, Scale>,
+                                          std::tuple<Col, Row, Row, F32, ck::Tuple<>, Scale>,
+                                          std::tuple<Col, Col, Row, F32, ck::Tuple<>, Scale>,
+                                          std::tuple<Row, Row, Row, F64, ck::Tuple<>, Scale>,
+                                          std::tuple<Row, Col, Row, F64, ck::Tuple<>, Scale>,
+                                          std::tuple<Col, Row, Row, F64, ck::Tuple<>, Scale>,
+                                          std::tuple<Col, Col, Row, F64, ck::Tuple<>, Scale>>;
+TYPED_TEST_SUITE(TestContractionBilinear, BilinearKernelTypes);
+TYPED_TEST_SUITE(TestContractionScale, ScaleKernelTypes);
+TYPED_TEST(TestContractionBilinear, bilinear)
+{
+    this->p_cd_element_op = std::make_unique<Bilinear>(1.f, 1.f);
+    this->Run();
+    this->p_cd_element_op = std::make_unique<Bilinear>(-0.5f, 0.5f);
+    this->Run();
+}
+TYPED_TEST(TestContractionScale, scale)
+{
+    this->p_cd_element_op = std::make_unique<Scale>(1.f);
+    this->Run();
+    this->p_cd_element_op = std::make_unique<Scale>(0.5f);
+    this->Run();
+}
--- a/test/contraction/test_contraction_interface.cpp
+++ b/test/contraction/test_contraction_interface.cpp
+// SPDX-License-Identifier: MIT
+// Copyright (c) 2023, Advanced Micro Devices, Inc. All rights reserved.
+#include <stdexcept>
+#include <vector>
+#include "gtest/gtest.h"
+#include "ck/tensor_operation/gpu/element/element_wise_operation.hpp"
+#include "ck/tensor_operation/gpu/device/device_contraction_multiple_d.hpp"
+#include "ck/tensor_operation/gpu/device/gemm_specialization.hpp"
+#include "ck/tensor_operation/gpu/device/impl/device_contraction_multiple_d_xdl_cshuffle.hpp"
+#include "ck/library/tensor_operation_instance/gpu/contraction_bilinear.hpp"
+#include "ck/library/utility/device_memory.hpp"
+using Pass     = ck::tensor_operation::element_wise::PassThrough;
+using Bilinear = ck::tensor_operation::element_wise::Bilinear;
+template <ck::index_t... Is>
+using S = ck::Sequence<Is...>;
+using F32 = float;
+using F64 = double;
+template <ck::index_t ABlockTransferSrcVectorDim,
+          ck::index_t BBlockTransferSrcVectorDim,
+          ck::index_t CDEBlockTransferScalarPerVector>
+class ContractionInstanceWrapper
+{
+    public:
+    static constexpr auto GemmSpec = ck::tensor_operation::device::GemmSpecialization::MNKPadding;
+    static constexpr ck::index_t NumDim = 2;
+    // clang-format off
+    using ContractionDeviceInstance = ck::tensor_operation::device::
+        //#####################################| NumDimM| NumDimN| NumDimK| AData| BData| AccData| CShuffle|         DsData| EData|            A|           B|          CDE|           GEMM| NumGemmK| Block|  MPer|  NPer|  KPer| AK1| BK1| MPer| NPer| MXdl| NXdl|  ABlockTransfer| ABlockTransfer| ABlockTransfer|             ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockLds|  BBlockTransfer| BBlockTransfer| BBlockTransfer|              BlockTransfer| BBlockTransfer| BBlockTransfer| BBlockLds|    CShuffle|    CShuffle| CBlockTransferClusterLengths|                  CBlockTransfer|
+        //#####################################|        |        |        |  Type|  Type|    Type| DataType|           Type|  Type|  Elementwise| Elementwise|  Elementwise| Spacialization| Prefetch|  Size| Block| Block| Block|    |    |  XDL|  XDL|  Per|  Per|   ThreadCluster|  ThreadCluster| SrcAccessOrder|               SrcVectorDim|      SrcScalar|      DstScalar| AddExtraM|   ThreadCluster|  ThreadCluster| SrcAccessOrder|               SrcVectorDim|      SrcScalar|      DstScalar| AddExtraN| MXdlPerWave| NXdlPerWave|         _MBlock_MWaveMPerXdl|                 ScalarPerVector|
+        //#####################################|        |        |        |      |      |        |         |               |      |    Operation|   Operation|    Operation|               |    Stage|      |      |      |      |    |    |     |     | Wave| Wave| Lengths_K0_M_K1|   ArrangeOrder|               |                           |      PerVector|   PerVector_K1|          | Lengths_K0_N_K1|   ArrangeOrder|               |                           |      PerVector|   PerVector_K1|          |  PerShuffle|  PerShuffle|         _NBlock_NWaveNPerXdl|                   _NWaveNPerXdl|
+        //#####################################|        |        |        |      |      |        |         |               |      |             |            |             |               |         |      |      |      |      |    |    |     |     |     |     |                |               |               |                           |               |               |          |                |               |               |                           |               |               |          |            |            |                             |                                |
+        DeviceContractionMultipleD_Xdl_CShuffle<  NumDim,  NumDim,  NumDim,   F32,   F32,     F32,      F32, ck::Tuple<F32>,   F32,         Pass,        Pass,     Bilinear,       GemmSpec,        1,   256,   256,   128,    16,   4,   4,   32,   32,    4,    2,     S<4, 64, 1>,     S<1, 0, 2>,     S<1, 0, 2>, ABlockTransferSrcVectorDim,              4,              4,         1,     S<4, 64, 1>,     S<1, 0, 2>,     S<1, 0, 2>, BBlockTransferSrcVectorDim,              4,              4,         1,           1,           1,              S<1, 16, 1, 16>, CDEBlockTransferScalarPerVector>;
+    // clang-format on
+    bool isSupported(std::vector<ck::index_t>& ADims,
+                     std::vector<ck::index_t>& BDims,
+                     std::vector<ck::index_t>& DDims,
+                     std::vector<ck::index_t>& EDims,
+                     std::vector<ck::index_t>& AStrides,
+                     std::vector<ck::index_t>& BStrides,
+                     std::vector<ck::index_t>& DStrides,
+                     std::vector<ck::index_t>& EStrides) const
+    {
+        auto contraction = ContractionDeviceInstance{};
+        auto argument = contraction.MakeArgument(nullptr,
+                                                 nullptr,
+                                                 std::array<const void*, 1>{nullptr},
+                                                 nullptr,
+                                                 ADims,
+                                                 AStrides,
+                                                 BDims,
+                                                 BStrides,
+                                                 std::array<std::vector<ck::index_t>, 1>{DDims},
+                                                 std::array<std::vector<ck::index_t>, 1>{DStrides},
+                                                 EDims,
+                                                 EStrides,
+                                                 Pass{},
+                                                 Pass{},
+                                                 Bilinear{1.f, 1.f});
+        return contraction.IsSupportedArgument(argument);
+    }
+};
+template <typename DataTypeA,
+          typename DataTypeB,
+          typename DataTypeC,
+          typename DataTypeD,
+          ck::index_t NumDim>
+class ContractionDeviceOpWrapper
+{
+    protected:
+    using DeviceOp = ck::tensor_operation::device::DeviceContractionMultipleD<NumDim,
+                                                                              NumDim,
+                                                                              NumDim,
+                                                                              DataTypeA,
+                                                                              DataTypeB,
+                                                                              ck::Tuple<DataTypeC>,
+                                                                              DataTypeD,
+                                                                              Pass,
+                                                                              Pass,
+                                                                              Bilinear>;
+    public:
+    bool IsSupportedInstance(std::vector<ck::index_t>& Dims,
+                             std::vector<ck::index_t>& Strides) const
+    {
+        bool supported     = false;
+        const auto op_ptrs = ck::tensor_operation::device::instance::DeviceOperationInstanceFactory<
+            DeviceOp>::GetInstances();
+        for(auto& op_ptr : op_ptrs)
+        {
+            auto argument_ptr =
+                op_ptr->MakeArgumentPointer(nullptr,
+                                            nullptr,
+                                            std::array<const void*, 1>{nullptr},
+                                            nullptr,
+                                            Dims,
+                                            Strides,
+                                            Dims,
+                                            Strides,
+                                            std::array<std::vector<ck::index_t>, 1>{Dims},
+                                            std::array<std::vector<ck::index_t>, 1>{Strides},
+                                            Dims,
+                                            Strides,
+                                            Pass{},
+                                            Pass{},
+                                            Bilinear{1.f, 1.f});
+            supported = supported || op_ptr->IsSupportedArgument(argument_ptr.get());
+        }
+        return supported;
+    }
+};
+TEST(TestContractionInterface, IncorrectNumDims)
+{
+    std::vector<std::vector<ck::index_t>> Dims    = {{4, 4}, {4, 4, 4, 4}, {4, 4, 4, 4, 4, 4}};
+    std::vector<std::vector<ck::index_t>> Strides = {{1, 1}, {1, 1, 1, 1}, {1, 1, 1, 1, 1, 1}};
+    ContractionDeviceOpWrapper<F32, F32, F32, F32, 1> wrapper_1d;
+    ContractionDeviceOpWrapper<F32, F32, F32, F32, 2> wrapper_2d;
+    ContractionDeviceOpWrapper<F32, F32, F32, F32, 3> wrapper_3d;
+    EXPECT_FALSE(wrapper_1d.IsSupportedInstance(Dims[0], Strides[0]));
+    EXPECT_TRUE(wrapper_2d.IsSupportedInstance(Dims[1], Strides[1]));
+    EXPECT_FALSE(wrapper_3d.IsSupportedInstance(Dims[2], Strides[2]));
+}
+TEST(TestContractionInterface, IncorrectDataTypes)
+{
+    std::vector<ck::index_t> Dims    = {4, 4, 4, 4};
+    std::vector<ck::index_t> Strides = {64, 16, 4, 1};
+    ContractionDeviceOpWrapper<F32, F32, F64, F64, 2> wrapper_1;
+    ContractionDeviceOpWrapper<F64, F64, F32, F32, 2> wrapper_2;
+    EXPECT_FALSE(wrapper_1.IsSupportedInstance(Dims, Strides));
+    EXPECT_FALSE(wrapper_2.IsSupportedInstance(Dims, Strides));
+}
+TEST(TestContractionSupportedArgs, ABMemoryAccess)
+{
+    std::vector<ck::index_t> Dims           = {4, 4, 4, 4};
+    std::vector<ck::index_t> Strides        = {64, 16, 4, 1};
+    std::vector<ck::index_t> StridesM1      = {4, 1, 64, 16};
+    std::vector<ck::index_t> StridesK1      = {64, 16, 4, 1};
+    std::vector<ck::index_t> InvalidStrides = {4, 4, 4, 4};
+    // Memory access to A
+    ContractionInstanceWrapper<1, 2, 4> wrapperA1;
+    ContractionInstanceWrapper<2, 2, 4> wrapperA2;
+    EXPECT_FALSE(
+        wrapperA1.isSupported(Dims, Dims, Dims, Dims, InvalidStrides, Strides, Strides, Strides));
+    EXPECT_FALSE(
+        wrapperA2.isSupported(Dims, Dims, Dims, Dims, InvalidStrides, Strides, Strides, Strides));
+    EXPECT_TRUE(
+        wrapperA1.isSupported(Dims, Dims, Dims, Dims, StridesM1, Strides, Strides, Strides));
+    EXPECT_TRUE(
+        wrapperA2.isSupported(Dims, Dims, Dims, Dims, StridesK1, Strides, Strides, Strides));
+    // Memory access to B
+    ContractionInstanceWrapper<2, 1, 4> wrapperB1;
+    ContractionInstanceWrapper<2, 2, 4> wrapperB2;
+    EXPECT_FALSE(
+        wrapperB1.isSupported(Dims, Dims, Dims, Dims, Strides, InvalidStrides, Strides, Strides));
+    EXPECT_FALSE(
+        wrapperB2.isSupported(Dims, Dims, Dims, Dims, Strides, InvalidStrides, Strides, Strides));
+    EXPECT_TRUE(
+        wrapperB1.isSupported(Dims, Dims, Dims, Dims, Strides, StridesM1, Strides, Strides));
+    EXPECT_TRUE(
+        wrapperB2.isSupported(Dims, Dims, Dims, Dims, Strides, StridesK1, Strides, Strides));
+}
+TEST(TestContractionSupportedArgs, DEMemoryAccess)
+{
+    std::vector<ck::index_t> Dims           = {4, 4, 4, 4};
+    std::vector<ck::index_t> Strides        = {64, 16, 4, 1};
+    std::vector<ck::index_t> InvalidStrides = {64, 16, 1, 4};
+    ContractionInstanceWrapper<2, 2, 4> wrapper;
+    // Memory access to D
+    EXPECT_FALSE(
+        wrapper.isSupported(Dims, Dims, Dims, Dims, Strides, Strides, InvalidStrides, Strides));
+    EXPECT_TRUE(wrapper.isSupported(Dims, Dims, Dims, Dims, Strides, Strides, Strides, Strides));
+    // Memory access to E
+    EXPECT_FALSE(
+        wrapper.isSupported(Dims, Dims, Dims, Dims, Strides, Strides, Strides, InvalidStrides));
+    EXPECT_TRUE(wrapper.isSupported(Dims, Dims, Dims, Dims, Strides, Strides, Strides, Strides));
+}