MLIR MIOpen Dialect integration (phase 1) (#768) (#769)

* MLIR MIOpen Dialect integration (phase 1) (#768) * Added Findmlir.cmake (using environment variables to import) * Added mlir_conv pass to GPU target * Apply to any gpu::convolution if supported by MLIR * Call MLIR C-API to generate iGEMM kernel with configuration from gpu::convolution * Capture binary in dictionary for matching convolutions * Build a code_object_op with the binary and execution dimensions * Substitute for the gpu::convolution * Changed the parameters for the code_object to reflect the generated MLIR kernel * Expanded out MemRefDescriptor fields in param list * Also updated for MLIR C-API changes * * fixed global_size calculation * MLIR MIOpen Dialect integration (phase 1) (#768) * Added Findmlir.cmake (using environment variables to import) * Added mlir_conv pass to GPU target * Apply to any gpu::convolution if supported by MLIR * Call MLIR C-API to generate iGEMM kernel with configuration from gpu::convolution * Capture binary in dictionary for matching convolutions * Build a code_object_op with the binary and execution dimensions * Substitute for the gpu::convolution * Changed the parameters for the code_object to reflect the generated MLIR kernel * Expanded out MemRefDescriptor fields in param list * Also updated for MLIR C-API changes * * Added command line option: --enable_mlir * * fixed command line switch * updated for new MLIR API changes * * Added cget llvm-project-mlir to import MIIR API libraries into Dockerfile * removed cmake Findmlir * updated for changes in MIIR C-API * * updated CMakeLists.txt to allow disable of MLIR import * fixed memory leaks and removed copies * updated for 5D memrefs * * formatting * * fixed review comments * * fixed merge issues * hip gcnDeviceName now includes specifiers at the end * use major/minor values instead * * disable MLIR by default * * removed command-line switch --enable-mlir * * fix unused when MLIR disabled * * enable jenkins enable/test MLIR * * format * * fixed clang-tidy * * added new type Co-authored-by: Paul Fultz II <pfultz2@yahoo.com> Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

MLIR MIOpen Dialect integration (phase 1) (#768) (#769)
* MLIR MIOpen Dialect integration (phase 1) (#768) * Added Findmlir.cmake (using environment variables to import) * Added mlir_conv pass to GPU target * Apply to any gpu::convolution if supported by MLIR * Call MLIR C-API to generate iGEMM kernel with configuration from gpu::convolution * Capture binary in dictionary for matching convolutions * Build a code_object_op with the binary and execution dimensions * Substitute for the gpu::convolution * Changed the parameters for the code_object to reflect the generated MLIR kernel * Expanded out MemRefDescriptor fields in param list * Also updated for MLIR C-API changes * * fixed global_size calculation * MLIR MIOpen Dialect integration (phase 1) (#768) * Added Findmlir.cmake (using environment variables to import) * Added mlir_conv pass to GPU target * Apply to any gpu::convolution if supported by MLIR * Call MLIR C-API to generate iGEMM kernel with configuration from gpu::convolution * Capture binary in dictionary for matching convolutions * Build a code_object_op with the binary and execution dimensions * Substitute for the gpu::convolution * Changed the parameters for the code_object to reflect the generated MLIR kernel * Expanded out MemRefDescriptor fields in param list * Also updated for MLIR C-API changes * * Added command line option: --enable_mlir * * fixed command line switch * updated for new MLIR API changes * * Added cget llvm-project-mlir to import MIIR API libraries into Dockerfile * removed cmake Findmlir * updated for changes in MIIR C-API * * updated CMakeLists.txt to allow disable of MLIR import * fixed memory leaks and removed copies * updated for 5D memrefs * * formatting * * fixed review comments * * fixed merge issues * hip gcnDeviceName now includes specifiers at the end * use major/minor values instead * * disable MLIR by default * * removed command-line switch --enable-mlir * * fix unused when MLIR disabled * * enable jenkins enable/test MLIR * * format * * fixed clang-tidy * * added new type Co-authored-by: Paul Fultz II <pfultz2@yahoo.com> Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>
56584fa2 · SJW · GitHub · bd0bd7ef · 56584fa2 · 56584fa2
Unverified Commit 56584fa2 authored Apr 29, 2021 by SJW Committed by GitHub Apr 29, 2021
8 changed files
--- a/Dockerfile
+++ b/Dockerfile
@@ -74,7 +74,7 @@ RUN cget -p $PREFIX install facebook/zstd@v1.4.5 -X subdir -DCMAKE_DIR=build/cma
 RUN cget -p $PREFIX install ccache@v4.1
 # Install newer cmake for onnx runtime
-RUN cget -p /opt/cmake install kitware/cmake@v3.13.0
+RUN cget -p /opt/cmake install kitware/cmake@v3.13.4
 ARG ONNXRUNTIME_REPO=https://github.com/Microsoft/onnxruntime
 ARG ONNXRUNTIME_BRANCH=master
@@ -86,6 +86,8 @@ RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXR
 ADD tools/build_and_test_onnxrt.sh /onnxruntime/build_and_test_onnxrt.sh
+RUN PATH=/opt/cmake/bin:$PATH cget -p /usr/local install ROCmSoftwarePlatform/llvm-project-mlir@02078ce236ad90e3aec04c0c770ef5bfc99e49c2
 ENV MIOPEN_FIND_DB_PATH=/tmp/miopen/find-db
 ENV MIOPEN_USER_DB_PATH=/tmp/miopen/user-db
 ENV LD_LIBRARY_PATH=$PREFIX/lib

--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -94,6 +94,12 @@ rocmtest clang_debug: rocmnode('vega') { cmake_build ->
        cmake_build("/opt/rocm/llvm/bin/clang++", "-DCMAKE_BUILD_TYPE=release")
        stash includes: 'build/*.deb', name: 'migraphx-package'
    }
+}, mlir_debug: rocmnode('vega') { cmake_build ->
+    stage('MLIR Debug') {
+        def sanitizers = "undefined"
+        def debug_flags = "-g -O2 -fsanitize=${sanitizers} -fno-sanitize-recover=${sanitizers}"
+        cmake_build("/opt/rocm/llvm/bin/clang++", "-DCMAKE_BUILD_TYPE=debug -DMIGRAPHX_ENABLE_PYTHON=Off -DMIGRAPHX_ENABLE_MLIR=On -DCMAKE_CXX_FLAGS_DEBUG='${debug_flags}'")
+    }
 }
 def onnxnode(name, body) {

--- a/src/include/migraphx/value.hpp
+++ b/src/include/migraphx/value.hpp
@@ -160,6 +160,7 @@ struct value
        binary(T* data, std::size_t s) : base(data, data + s)
        {
        }
+        explicit binary(std::size_t s) : base(s) {}
    };
    value() = default;

--- a/src/targets/gpu/CMakeLists.txt
+++ b/src/targets/gpu/CMakeLists.txt
@@ -133,6 +133,7 @@ add_library(migraphx_gpu
    logsoftmax.cpp
    lrn.cpp
    leaky_relu.cpp
+    mlir_conv.cpp
    pack_args.cpp
    pack_int8_args.cpp
    pad.cpp
@@ -148,6 +149,7 @@ add_library(migraphx_gpu
    write_literals.cpp
 )
 set_target_properties(migraphx_gpu PROPERTIES EXPORT_NAME gpu)
 function(register_migraphx_gpu_ops PREFIX)
    foreach(OP ${ARGN})
        register_op(migraphx_gpu HEADER migraphx/gpu/${OP}.hpp OPERATORS gpu::${PREFIX}${OP} INCLUDES migraphx/gpu/context.hpp)
@@ -259,6 +261,20 @@ endif()
 message(STATUS "clang-offload-bundler: ${MIGRAPHX_OFFLOADBUNDLER_BIN}")
 message(STATUS "extractkernel: ${MIGRAPHX_EXTRACT_KERNEL}")
+set(MIGRAPHX_ENABLE_MLIR OFF CACHE BOOL "")
+if(MIGRAPHX_ENABLE_MLIR)
+    find_library(LIBMLIRMIOPEN MLIRMIOpenThin REQUIRED)
+    # REQUIRED is not supported before cmake 3.18
+    if(NOT LIBMLIRMIOPEN)
+        message(FATAL_ERROR "libMLIRMIOpenThin not found")
+    else()
+        message(STATUS "Build with libMLIRMIOpenThin: " ${LIBMLIRMIOPEN})
+    endif()
+    target_compile_definitions(migraphx_gpu PRIVATE "-DMIGRAPHX_MLIR_MIOPEN_SUPPORT")
+    target_link_libraries(migraphx_gpu PUBLIC ${LIBMLIRMIOPEN})
+endif()
 # Get flags needed to compile hip
 include(TargetFlags)
 target_flags(HIP_COMPILER_FLAGS hip::device)

--- a/src/targets/gpu/include/migraphx/gpu/context.hpp
+++ b/src/targets/gpu/include/migraphx/gpu/context.hpp
@@ -21,10 +21,20 @@ using hip_event_ptr = MIGRAPHX_MANAGE_PTR(hipEvent_t, hipEventDestroy);
 struct hip_device
 {
-    hip_device() { add_stream(); }
+    hip_device()
+    {
+        device_props.gcnArchName[0]      = '\0';
+        device_props.gcnArch             = 0;
+        device_props.multiProcessorCount = 0;
+        add_stream();
+    }
    hip_device(std::size_t id, std::size_t n) : device_id(id)
    {
+        auto status = hipGetDeviceProperties(&device_props, device_id);
+        if(status != hipSuccess)
+            MIGRAPHX_THROW("Failed to allocate stream");
        for(std::size_t i = 0; i < n; i++)
            add_stream();
    }
@@ -122,10 +132,19 @@ struct hip_device
    std::size_t stream_id() const { return current_stream; }
+    std::string get_device_name() const { return device_props.gcnArchName; }
+    std::size_t get_device_major() const { return device_props.major; }
+    std::size_t get_device_minor() const { return device_props.minor; }
+    std::size_t get_cu_count() const { return device_props.multiProcessorCount; }
    private:
    std::size_t device_id      = 0;
    std::size_t current_stream = 0;
    std::vector<stream> streams;
+    hipDeviceProp_t device_props;
    public:
    std::unordered_map<std::string, argument> preallocations{};

--- a/src/targets/gpu/include/migraphx/gpu/mlir_conv.hpp
+++ b/src/targets/gpu/include/migraphx/gpu/mlir_conv.hpp
+#ifndef MIGRAPHX_GUARD_RTGLIB_MIOPEN_MLIR_CONV_HPP
+#define MIGRAPHX_GUARD_RTGLIB_MIOPEN_MLIR_CONV_HPP
+#include <migraphx/config.hpp>
+#include <migraphx/gpu/context.hpp>
+namespace migraphx {
+inline namespace MIGRAPHX_INLINE_NS {
+struct module;
+namespace gpu {
+struct mlir_conv
+{
+    context* ctx;
+    std::string name() const { return "mlir::convolution"; }
+    void apply(module& m) const;
+};
+} // namespace gpu
+} // namespace MIGRAPHX_INLINE_NS
+} // namespace migraphx
+#endif
--- a/src/targets/gpu/mlir_conv.cpp
+++ b/src/targets/gpu/mlir_conv.cpp
+#include <migraphx/gpu/mlir_conv.hpp>
+#include <migraphx/manage_ptr.hpp>
+#include <migraphx/instruction.hpp>
+#include <migraphx/make_op.hpp>
+#include <migraphx/op/convolution.hpp>
+#include <migraphx/gpu/context.hpp>
+#include <migraphx/gpu/convolution.hpp>
+#include <migraphx/iterator_for.hpp>
+#include <migraphx/program.hpp>
+#include <migraphx/ranges.hpp>
+#include <migraphx/make_op.hpp>
+#include <migraphx/generate.hpp>
+#include <migraphx/program.hpp>
+#include <migraphx/gpu/kernel.hpp>
+#include <migraphx/gpu/target.hpp>
+#include <migraphx/gpu/hip.hpp>
+#include <migraphx/gpu/compile_hip.hpp>
+#include <utility>
+#include <functional>
+#include <algorithm>
+#ifdef MIGRAPHX_MLIR_MIOPEN_SUPPORT
+#include <Miir.h>
+#endif // MIGRAPHX_MLIR_MIOPEN_SUPPORT
+#include <cstdio>
+namespace migraphx {
+inline namespace MIGRAPHX_INLINE_NS {
+namespace gpu {
+struct mlir_apply
+{
+    module* mod           = nullptr;
+    const mlir_conv* pass = nullptr;
+    const char* mlir_kernel_name = "migraphx_conv2d";
+    std::unordered_map<uint64_t, instruction_ref> literal_map{};
+    struct execution_spec
+    {
+        migraphx::value::binary binary;
+        size_t global_size;
+        size_t local_size;
+        execution_spec(migraphx::value::binary&& binary_m, size_t global_s, size_t local_s)
+            : binary(std::move(binary_m)), global_size(global_s), local_size(local_s)
+        {
+        }
+    };
+    std::unordered_map<std::string, std::shared_ptr<execution_spec>> binary_map{};
+    context& get_context() const
+    {
+        assert(pass != nullptr);
+        assert(pass->ctx != nullptr);
+        return *pass->ctx;
+    }
+    void init() const
+    {
+        assert(mod != nullptr);
+        assert(pass != nullptr);
+    }
+    std::shared_ptr<execution_spec> make_mlir_binary(instruction_ref op_r)
+    {
+        std::shared_ptr<execution_spec> result;
+#ifdef MIGRAPHX_MLIR_MIOPEN_SUPPORT
+        auto conv  = any_cast<op::convolution>(op_r->get_operator());
+        auto inp_t = op_r->inputs().at(0)->get_shape();
+        auto flt_t = op_r->inputs().at(1)->get_shape();
+        auto out_t = op_r->get_shape();
+        auto get_type_str = [](const shape& s) -> const char* {
+            switch(s.type())
+            {
+            case shape::float_type: return "f32";
+            case shape::half_type: return "f16";
+            case shape::bool_type:
+            case shape::double_type:
+            case shape::uint8_type:
+            case shape::int8_type:
+            case shape::uint16_type:
+            case shape::int16_type:
+            case shape::int32_type:
+            case shape::int64_type:
+            case shape::uint32_type:
+            case shape::uint64_type:
+            case shape::tuple_type: break;
+            }
+            return nullptr;
+        };
+        const auto* inp_t_s = get_type_str(inp_t);
+        const auto* flt_t_s = get_type_str(flt_t);
+        const auto* out_t_s = get_type_str(out_t);
+        if(out_t_s == nullptr || inp_t_s == nullptr || flt_t_s == nullptr)
+            return result;
+        std::string mlir_options = "--kernel_name " + std::string(mlir_kernel_name);
+        // platform spec
+        auto& device = get_context().get_current_device();
+        char dev_name[64];
+        sprintf(dev_name, "gfx%lu%02lu", device.get_device_major(), device.get_device_minor());
+        mlir_options += " --arch " + std::string(dev_name) + " --num_cu " +
+                        std::to_string(device.get_cu_count()); // ???
+        // Conv spec
+        mlir_options +=
+            " --operation "
+            "conv2d"
+            " --batchsize " +
+            std::to_string(conv.group) + " --groupsize " + std::to_string(1) + " --padding_h " +
+            std::to_string(conv.padding[0]) + " --padding_w " + std::to_string(conv.padding[1]) +
+            " --conv_stride_h " + std::to_string(conv.stride[0]) + " --conv_stride_w " +
+            std::to_string(conv.stride[1]) + " --dilation_h " + std::to_string(conv.dilation[0]) +
+            " --dilation_w " + std::to_string(conv.dilation[1]);
+        // Input spec
+        mlir_options += " --in_layout "
+                        "NCHWG"
+                        " --in_type " +
+                        std::string(inp_t_s) + " --in_channels " + std::to_string(inp_t.lens()[1]) +
+                        " --in_h " + std::to_string(inp_t.lens()[2]) + " --in_w " +
+                        std::to_string(inp_t.lens()[3]);
+        // Filter spec
+        mlir_options += " --fil_layout "
+                        "NCHWG"
+                        " --fil_type " +
+                        std::string(flt_t_s) + " --fil_h " + std::to_string(flt_t.lens()[2]) +
+                        " --fil_w " + std::to_string(flt_t.lens()[3]);
+        // Output spec
+        mlir_options += " --out_layout "
+                        "NCHWG"
+                        " --out_type " +
+                        std::string(out_t_s) + " --out_channels " +
+                        std::to_string(out_t.lens()[1]) + " --out_h " +
+                        std::to_string(out_t.lens()[2]) + " --out_w " +
+                        std::to_string(out_t.lens()[3]);
+        auto bin_i = binary_map.find(mlir_options);
+        if(bin_i == binary_map.end())
+        {
+            size_t bin_size = 0;
+            using mlir_handle = MIGRAPHX_MANAGE_PTR(MiirHandle, miirDestroyHandle);
+            auto handle       = mlir_handle(miirCreateHandle(mlir_options.c_str()));
+            if(miirLowerBin(handle.get()) == MIIR_SUCCESS &&
+               miirBufferGet(handle.get(), nullptr, &bin_size) == MIIR_SUCCESS)
+            {
+                migraphx::value::binary bin(bin_size);
+                if(miirBufferGet(handle.get(), reinterpret_cast<char*>(bin.data()), &bin_size) ==
+                   MIIR_SUCCESS)
+                {
+                    size_t global_size;
+                    size_t block_size;
+                    if(miirGetExecutionDims(handle.get(), &global_size, &block_size) ==
+                       MIIR_SUCCESS)
+                    {
+                        result = std::make_shared<execution_spec>(
+                            std::move(bin), global_size, block_size);
+                    }
+                }
+            }
+            binary_map[mlir_options] = result;
+        }
+        else
+        {
+            result = bin_i->second;
+        }
+#else  // MIGRAPHX_MLIR_MIOPEN_SUPPORT
+        (void)op_r;
+#endif // MIGRAPHX_MLIR_MIOPEN_SUPPORT
+        return result;
+    }
+    instruction_ref get_literal(uint64_t value)
+    {
+        auto fi = literal_map.find(value);
+        if(fi != literal_map.end())
+            return fi->second;
+        auto lit = mod->add_literal(value);
+        literal_map.emplace(value, lit);
+        return lit;
+    }
+    operation make_code_object_op(instruction_ref op_r, const std::shared_ptr<execution_spec>& spec)
+    {
+        // each pointer is expanded out to a MemRefDescriptor
+        auto inp_t = op_r->inputs().at(0)->get_shape();
+        auto flt_t = op_r->inputs().at(1)->get_shape();
+        auto out_t = op_r->get_shape();
+        auto i64 = shape(shape::uint64_type);
+        std::vector<shape> expected_inputs = {
+            flt_t, flt_t, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64,  i64,   inp_t,
+            inp_t, i64,   i64, i64, i64, i64, i64, i64, i64, i64, i64, i64,  out_t, out_t,
+            i64,   i64,   i64, i64, i64, i64, i64, i64, i64, i64, i64, out_t};
+        return migraphx::make_op("gpu::code_object",
+                                 {
+                                     {"code_object", spec->binary},
+                                     {"symbol_name", mlir_kernel_name},
+                                     {"global", spec->global_size},
+                                     {"local", spec->local_size},
+                                     {"expected_inputs", migraphx::to_value(expected_inputs)},
+                                     {"output", migraphx::to_value(out_t)},
+                                 });
+    }
+    void add_memref_descriptor(std::vector<instruction_ref>& refs, instruction_ref inst)
+    {
+        const size_t offset = 0;
+        auto inst_t         = inst->get_shape();
+        refs.push_back(inst);
+        refs.push_back(inst);
+        refs.push_back(get_literal(offset)); // offset
+        // dim sizes
+        std::transform(inst_t.lens().begin(),
+                       inst_t.lens().end(),
+                       std::back_inserter(refs),
+                       [&](const auto& lval) { return get_literal(lval); });
+        refs.push_back(get_literal(1)); // G
+        // dim strides
+        std::transform(inst_t.strides().begin(),
+                       inst_t.strides().end(),
+                       std::back_inserter(refs),
+                       [&](const auto& lval) { return get_literal(lval); });
+        refs.push_back(get_literal(1)); // G
+    }
+    instruction_ref insert_allocation(instruction_ref ins, const shape& s) const
+    {
+        return mod->insert_instruction(ins, hip_allocate{s});
+    }
+    void replace_conv_op(instruction_ref ins)
+    {
+        auto conv_bin = make_mlir_binary(ins);
+        if(conv_bin)
+        {
+            auto conv = make_code_object_op(ins, conv_bin);
+            auto inp = ins->inputs().at(0);
+            auto flt = ins->inputs().at(1);
+            auto out = insert_allocation(ins, ins->get_shape());
+            std::vector<instruction_ref> refs;
+            refs.reserve(3 * 13 + 1);
+            add_memref_descriptor(refs, flt);
+            add_memref_descriptor(refs, inp);
+            add_memref_descriptor(refs, out);
+            refs.push_back(out);
+            mod->replace_instruction(ins, conv, refs);
+        }
+    }
+    void apply()
+    {
+        init();
+        for(auto it : iterator_for(*mod))
+        {
+            if(it->name() == "convolution")
+            {
+                replace_conv_op(it);
+            }
+        }
+    }
+};
+void mlir_conv::apply(module& m) const { mlir_apply{&m, this}.apply(); }
+} // namespace gpu
+} // namespace MIGRAPHX_INLINE_NS
+} // namespace migraphx
--- a/src/targets/gpu/target.cpp
+++ b/src/targets/gpu/target.cpp
@@ -26,6 +26,7 @@
 #include <migraphx/gpu/eliminate_workspace.hpp>
 #include <migraphx/gpu/fuse_ops.hpp>
 #include <migraphx/gpu/lowering.hpp>
+#include <migraphx/gpu/mlir_conv.hpp>
 #include <migraphx/gpu/pack_int8_args.hpp>
 #include <migraphx/gpu/preallocate_param.hpp>
 #include <migraphx/gpu/schedule_model.hpp>
@@ -67,6 +68,7 @@ std::vector<pass> target::get_passes(migraphx::context& gctx, const compile_opti
        simplify_reshapes{},
        propagate_constant{},
        dead_code_elimination{},
+        mlir_conv{&ctx},
        lowering{&ctx, options.offload_copy},
        eliminate_contiguous{"gpu::contiguous"},
        dead_code_elimination{},