Module build exec (#765)

* code cleanup * clang format * backup code * clang format * remove unnecessary code * clang format * add module print function * code backup * refine the module::print function * refine the module:to_value() function * code backup * backup code changes * code backup * remove to_value and from_value function from the module class * rename a function * rename the if operator * refine the if operator * refine the print function of module and program * code backup * code backup * fix a build warning * fix overload of compute_shape function * code backup * fix unit test error * fix cppcheck error * fix the issue related to the overload of compute_shape * fix review comments * fix cppcheck error * change the return name of if_op to be if * clang format * fix two unit tests * clang format * rename variables * clang format * remove the unused compute_op function * clang format * add lowering of if operator and compute_op function * clang format * add parsing if operator in onnx file * clang format * fix clang tidy format * clang format * add the gpu implementation of the if operator * enhance the validate function and uncomment a unit test * clang format * remove unnecessary code * add sub_module processing in ref passes * clang format * clang format * fix a hang issue related to the valid function * fix an issue in replace_refs * clang format * fix review comments * clang format * fix cppcheck error * clang format * add a unit test for more code coverage * clang format * fix review comments and add test for more code coverage * clang format * fix cppcheck error * clang format * fix cppcheck error * fix a cppcheck error * clang format * backup code * clang format * fix cppcheck error * clang format * some code refinement * clang format * code backup to handle submodules in module compilation * clang format * code backup * clang format * code backup * clang format * fix a bug related to literal id * fix a bug in gpu execution * change the way of compiling a graph * clang format * backup more changes * clang format * refine pass log information * remove unnecessary code * clang format * temp changes backup * clang format * add module name prefix to scratch memory id in hip_memory_allocation * clang format * change to copy the cond input by inserting a copy instruction * clang format * change to use the if output argument as the submodule output so can remove a gpu_copy * clang format * consider submodule in some compile passes * clang format * fix review comments * clang format * fix issues related to scratch memory * clang format * remove unnecessary code * fix cppcheck error * clang format * reslove the implicit dependencies issue related to submodule * clang format * fix cppcheck error * clang format * backup temp changes * clang format * fixed an bug in the has_instruction function * clang format * fix the return value of the gpu implementation of the if operator * fix a bug in the compute_shape function in the gpu implementation * add an if onnx unit test * clang format * add more unit tests * clang format * tmp code backup * clang format * fix a sync problem related to copy cond argument from gpu to cpu * clang format * change the compile offload copy flag setting * clang format * enable copy from cpu to be able to do synchronous copy * clang format * add more unit tests * add more unit tests * add more ref unit tests * clang format * fixed a bug error * tmp code backup * clang format * fixed an onnx verify unit test * add more unit tests * clang format * reverse a change * fix cppcheck error * fix cppcheck error * fix to print all instructions in program execution * clang format * fix bugs related to memory coloring and offload copy to be true * clang format * remove unnecessary include header file * sort test cases in ref_cpu_ops alphabetically * clang format * add a flag to disable cpu target in verification test * change the way to disable some tests * clang format * disable verify unit test of the if operators * add a function call to have more code coverage * fix a build error * fix review comments * fix review comments * clang format * add a api gpu unit test for more code coverage * clang format * change to use instruction.size() as node index * move the calc_implicit_deps function to module class as a member function * clang format * move the offload_copy flag setting to lowering * clang format * assign the module_eval lambda function to a variable to simplify code * clang format * move the compute function from ref/gpu implementation to the main if operator * clang format * fix cpp check error * add a unit test for more code coverage * clang format * add unit test to calculate implicit deps * add a python unit test * clang format * refine a unit test to have more code coverage * clang format * chang the way of wrap up arguments for sub modules * clang format * fix some build errors * code cleanup * refine unit tests to have more code coverage * clang format * refine unit test to have more code coverage * code backup * clang format * add memory coloring test * refine memory coloring unit test * clang format * remove an unnecessary line * remove an unused line * remove an unnecessary parameter in the lambda function * clang format * refine a unit test * remove an unnecessary line * refine unit tests to have more code coverage * clang format * combine two lines * add one more unit test for more code coverage * clang format * add one more unit test * clang format * fix review comments * refine a print out information * fix review comments * clang format * change the sync copy to using a gpu device sync * clang format * remove unnecessary code Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

Module build exec (#765)
* code cleanup * clang format * backup code * clang format * remove unnecessary code * clang format * add module print function * code backup * refine the module::print function * refine the module:to_value() function * code backup * backup code changes * code backup * remove to_value and from_value function from the module class * rename a function * rename the if operator * refine the if operator * refine the print function of module and program * code backup * code backup * fix a build warning * fix overload of compute_shape function * code backup * fix unit test error * fix cppcheck error * fix the issue related to the overload of compute_shape * fix review comments * fix cppcheck error * change the return name of if_op to be if * clang format * fix two unit tests * clang format * rename variables * clang format * remove the unused compute_op function * clang format * add lowering of if operator and compute_op function * clang format * add parsing if operator in onnx file * clang format * fix clang tidy format * clang format * add the gpu implementation of the if operator * enhance the validate function and uncomment a unit test * clang format * remove unnecessary code * add sub_module processing in ref passes * clang format * clang format * fix a hang issue related to the valid function * fix an issue in replace_refs * clang format * fix review comments * clang format * fix cppcheck error * clang format * add a unit test for more code coverage * clang format * fix review comments and add test for more code coverage * clang format * fix cppcheck error * clang format * fix cppcheck error * fix a cppcheck error * clang format * backup code * clang format * fix cppcheck error * clang format * some code refinement * clang format * code backup to handle submodules in module compilation * clang format * code backup * clang format * code backup * clang format * fix a bug related to literal id * fix a bug in gpu execution * change the way of compiling a graph * clang format * backup more changes * clang format * refine pass log information * remove unnecessary code * clang format * temp changes backup * clang format * add module name prefix to scratch memory id in hip_memory_allocation * clang format * change to copy the cond input by inserting a copy instruction * clang format * change to use the if output argument as the submodule output so can remove a gpu_copy * clang format * consider submodule in some compile passes * clang format * fix review comments * clang format * fix issues related to scratch memory * clang format * remove unnecessary code * fix cppcheck error * clang format * reslove the implicit dependencies issue related to submodule * clang format * fix cppcheck error * clang format * backup temp changes * clang format * fixed an bug in the has_instruction function * clang format * fix the return value of the gpu implementation of the if operator * fix a bug in the compute_shape function in the gpu implementation * add an if onnx unit test * clang format * add more unit tests * clang format * tmp code backup * clang format * fix a sync problem related to copy cond argument from gpu to cpu * clang format * change the compile offload copy flag setting * clang format * enable copy from cpu to be able to do synchronous copy * clang format * add more unit tests * add more unit tests * add more ref unit tests * clang format * fixed a bug error * tmp code backup * clang format * fixed an onnx verify unit test * add more unit tests * clang format * reverse a change * fix cppcheck error * fix cppcheck error * fix to print all instructions in program execution * clang format * fix bugs related to memory coloring and offload copy to be true * clang format * remove unnecessary include header file * sort test cases in ref_cpu_ops alphabetically * clang format * add a flag to disable cpu target in verification test * change the way to disable some tests * clang format * disable verify unit test of the if operators * add a function call to have more code coverage * fix a build error * fix review comments * fix review comments * clang format * add a api gpu unit test for more code coverage * clang format * change to use instruction.size() as node index * move the calc_implicit_deps function to module class as a member function * clang format * move the offload_copy flag setting to lowering * clang format * assign the module_eval lambda function to a variable to simplify code * clang format * move the compute function from ref/gpu implementation to the main if operator * clang format * fix cpp check error * add a unit test for more code coverage * clang format * add unit test to calculate implicit deps * add a python unit test * clang format * refine a unit test to have more code coverage * clang format * chang the way of wrap up arguments for sub modules * clang format * fix some build errors * code cleanup * refine unit tests to have more code coverage * clang format * refine unit test to have more code coverage * code backup * clang format * add memory coloring test * refine memory coloring unit test * clang format * remove an unnecessary line * remove an unused line * remove an unnecessary parameter in the lambda function * clang format * refine a unit test * remove an unnecessary line * refine unit tests to have more code coverage * clang format * combine two lines * add one more unit test for more code coverage * clang format * add one more unit test * clang format * fix review comments * refine a print out information * fix review comments * clang format * change the sync copy to using a gpu device sync * clang format * remove unnecessary code Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>
41c0487b · Shucai Xiao · GitHub · 5d601ad1 · 41c0487b · 41c0487b
Unverified Commit 41c0487b authored Apr 05, 2021 by Shucai Xiao Committed by GitHub Apr 05, 2021
20 changed files
--- a/src/targets/gpu/include/migraphx/gpu/hip.hpp
+++ b/src/targets/gpu/include/migraphx/gpu/hip.hpp
@@ -66,12 +66,19 @@ struct hip_sync_device
    }
    std::string name() const { return "hip::sync_device"; }
-    shape compute_shape(const std::vector<shape>&) const { return {}; }
+    shape compute_shape(const std::vector<shape>& inputs) const
+    {
+        if(inputs.empty())
+            return {};
+        return inputs.front();
+    }
-    argument compute(context&, const shape&, const std::vector<argument>&) const
+    argument compute(context&, const shape&, const std::vector<argument>& args) const
    {
        gpu_sync();
+        if(args.empty())
            return {};
+        return args.front();
    }
 };
@@ -119,7 +126,6 @@ struct hip_copy_from_gpu
            return result;
        }
        copy_from_gpu(ctx, args[0], args[1]);
        return args[1];
    }
    std::ptrdiff_t output_alias(const std::vector<shape>& args) const

--- a/src/targets/gpu/include/migraphx/gpu/lowering.hpp
+++ b/src/targets/gpu/include/migraphx/gpu/lowering.hpp
@@ -15,7 +15,7 @@ struct lowering
    context* ctx;
    bool offload_copy;
    std::string name() const { return "gpu::lowering"; }
-    void apply(module& p) const;
+    void apply(module& m) const;
 };
 } // namespace gpu

--- a/src/targets/gpu/lowering.cpp
+++ b/src/targets/gpu/lowering.cpp
@@ -9,6 +9,7 @@
 #include <migraphx/op/deconvolution.hpp>
 #include <migraphx/op/dot.hpp>
 #include <migraphx/op/elu.hpp>
+#include <migraphx/op/if_op.hpp>
 #include <migraphx/op/leaky_relu.hpp>
 #include <migraphx/op/lrn.hpp>
 #include <migraphx/op/pooling.hpp>
@@ -42,6 +43,7 @@
 #include <utility>
 #include <functional>
 #include <algorithm>
+#include <map>
 namespace migraphx {
 inline namespace MIGRAPHX_INLINE_NS {
@@ -49,11 +51,12 @@ namespace gpu {
 struct miopen_apply
 {
-    module* prog         = nullptr;
+    module* mod          = nullptr;
    const lowering* pass = nullptr;
    std::unordered_map<std::string, std::function<instruction_ref(instruction_ref)>> apply_map{};
    instruction_ref last{};
    std::unordered_map<instruction_ref, std::string> prog_output_names{};
+    bool offload_copy = false;
    context& get_context() const
    {
@@ -71,7 +74,7 @@ struct miopen_apply
    void create_output_names()
    {
-        this->last = instruction::get_output_alias(std::prev(prog->end()));
+        this->last = instruction::get_output_alias(std::prev(mod->end()));
        if(this->last->name() == "@return")
        {
            const auto& prog_outputs = last->inputs();
@@ -85,16 +88,17 @@ struct miopen_apply
            std::size_t index = 0;
            for(auto ins : outputs_alias)
            {
-                prog_output_names[ins] = "#output_" + std::to_string(index++);
+                prog_output_names[ins] = mod->name() + ":#output_" + std::to_string(index++);
            }
        }
    }
    void init()
    {
-        assert(prog != nullptr);
+        assert(mod != nullptr);
        assert(pass != nullptr);
+        offload_copy = (mod->name() == "main") ? pass->offload_copy : false;
        create_output_names();
        add_generic_op("acos");
@@ -169,26 +173,27 @@ struct miopen_apply
        add_quant_convolution_op();
        add_batch_norm_inference_op();
        add_neg_op();
+        add_if_op();
    }
    void copy_params()
    {
-        if(not pass->offload_copy)
+        if(not offload_copy)
            return;
-        for(auto ins : iterator_for(*prog))
+        for(auto ins : iterator_for(*mod))
        {
            if(ins->name() != "@param")
                continue;
            auto pos = std::next(ins);
            auto a   = insert_allocation(pos, ins->get_shape());
-            auto c   = prog->insert_instruction(pos, hip_copy_to_gpu{}, ins, a);
+            auto c   = mod->insert_instruction(pos, hip_copy_to_gpu{}, ins, a);
-            prog->replace_instruction(ins, c);
+            mod->replace_instruction(ins, c);
        }
        // return instruction
-        auto ret = std::prev(prog->end());
+        auto ret = std::prev(mod->end());
        if(ret->name() == "@return")
        {
            const auto& inputs = ret->inputs();
@@ -197,21 +202,21 @@ struct miopen_apply
            // output with copy output
            for(const auto& in : inputs)
            {
-                auto p_output = prog->insert_instruction(ret, hip_copy_from_gpu{}, in);
+                auto p_output = mod->insert_instruction(ret, hip_copy_from_gpu{}, in);
                instruction::replace_argument(ret, in, p_output);
            }
        }
        // else branch to handle legacy program without the return instruction
        else
        {
-            prog->add_instruction(hip_copy_from_gpu{}, ret);
+            mod->add_instruction(hip_copy_from_gpu{}, ret);
        }
    }
    void apply()
    {
        init();
-        for(auto it = prog->begin(); it != prog->end(); it++)
+        for(auto it = mod->begin(); it != mod->end(); it++)
        {
            auto s = it->get_shape();
            if(apply_map.count(it->name()) > 0)
@@ -226,23 +231,23 @@ struct miopen_apply
    instruction_ref insert_allocation(instruction_ref ins, const shape& s, std::string tag = "")
    {
        // Instruction's output is an input of the ret instruction
-        if(pass->offload_copy)
+        if(offload_copy)
        {
-            auto result = prog->insert_instruction(ins, hip_allocate{s, std::move(tag)});
+            auto result = mod->insert_instruction(ins, hip_allocate{s, std::move(tag)});
            return result;
        }
        auto ins_alias = instruction::get_output_alias(ins);
        if(last->name() == "@return" and tag.empty() and prog_output_names.count(ins_alias) > 0)
        {
-            return prog->add_parameter(prog_output_names[ins_alias], s);
+            return mod->add_parameter(prog_output_names[ins_alias], s);
        }
        else if(ins == last and tag.empty())
        {
-            return prog->add_parameter("output", s);
+            return mod->add_parameter("output", s);
        }
-        return prog->insert_instruction(ins, hip_allocate{s, std::move(tag)});
+        return mod->insert_instruction(ins, hip_allocate{s, std::move(tag)});
    }
    void add_convolution_op()
@@ -256,7 +261,7 @@ struct miopen_apply
            auto workspace = insert_allocation(ins, ws, "workspace");
            auto output    = insert_allocation(ins, ins->get_shape());
-            return prog->replace_instruction(
+            return mod->replace_instruction(
                ins, conv, ins->inputs().at(0), ins->inputs().at(1), workspace, output);
        });
    }
@@ -272,7 +277,7 @@ struct miopen_apply
            auto workspace = insert_allocation(ins, ws, "workspace");
            auto output    = insert_allocation(ins, ins->get_shape());
-            return prog->replace_instruction(
+            return mod->replace_instruction(
                ins, conv, ins->inputs().at(0), ins->inputs().at(1), workspace, output);
        });
    }
@@ -296,7 +301,7 @@ struct miopen_apply
                if(ins == last or refs.back()->outputs().size() > 1 or c_alias->inputs().empty())
                {
                    auto output   = insert_allocation(ins, ins->get_shape());
-                    auto copy_out = prog->insert_instruction(ins, hip_copy{}, refs.back(), output);
+                    auto copy_out = mod->insert_instruction(ins, hip_copy{}, refs.back(), output);
                    refs.back()   = copy_out;
                    refs.push_back(copy_out);
                }
@@ -306,7 +311,7 @@ struct miopen_apply
                }
            }
-            return prog->replace_instruction(ins, rocblas_gemm<Op>{Op{op.alpha, beta}}, refs);
+            return mod->replace_instruction(ins, rocblas_gemm<Op>{Op{op.alpha, beta}}, refs);
        });
    }
@@ -321,7 +326,7 @@ struct miopen_apply
            auto workspace = insert_allocation(ins, ws, "workspace");
            auto output    = insert_allocation(ins, ins->get_shape());
-            return prog->replace_instruction(ins, conv, args[0], args[1], workspace, output);
+            return mod->replace_instruction(ins, conv, args[0], args[1], workspace, output);
        });
    }
@@ -334,7 +339,7 @@ struct miopen_apply
            std::vector<instruction_ref> refs = ins->inputs();
            refs.push_back(output);
-            return prog->replace_instruction(ins, make_op(gpu_name), refs);
+            return mod->replace_instruction(ins, make_op(gpu_name), refs);
        });
    }
@@ -348,7 +353,7 @@ struct miopen_apply
            std::vector<instruction_ref> refs = ins->inputs();
            refs.push_back(output);
-            return prog->replace_instruction(ins, make_op(gpu_name, op.to_value()), refs);
+            return mod->replace_instruction(ins, make_op(gpu_name, op.to_value()), refs);
        });
    }
@@ -376,9 +381,9 @@ struct miopen_apply
            std::transform(ins->inputs().begin() + 1,
                           ins->inputs().end(),
                           std::back_inserter(reshapes),
-                           [&](auto i) { return prog->insert_instruction(ins, reshape_op, i); });
+                           [&](auto i) { return mod->insert_instruction(ins, reshape_op, i); });
-            return prog->replace_instruction(ins,
+            return mod->replace_instruction(ins,
                                            miopen_batch_norm_inference{op},
                                            input,
                                            reshapes[0],
@@ -396,15 +401,53 @@ struct miopen_apply
        apply_map.emplace("neg", [=](instruction_ref ins) {
            auto s = ins->get_shape();
            std::vector<float> zeros(s.elements(), 0.0f);
-            auto l0     = prog->add_literal(literal(s, zeros));
+            auto l0     = mod->add_literal(literal(s, zeros));
            auto output = insert_allocation(ins, s);
-            return prog->replace_instruction(
+            return mod->replace_instruction(
                ins, make_op("gpu::sub"), l0, ins->inputs().front(), output);
        });
    }
+    // replace the if operator with gpu_if operator
+    void add_if_op()
+    {
+        apply_map.emplace("if", [=](instruction_ref ins) {
+            std::vector<instruction_ref> inputs = ins->inputs();
+            auto cpu_cond  = mod->insert_instruction(ins, hip_copy_from_gpu{}, inputs.front());
+            auto sync_cond = mod->insert_instruction(ins, hip_sync_device{}, cpu_cond);
+            inputs.front() = sync_cond;
+            std::vector<module_ref> mod_args = ins->module_inputs();
+            std::map<std::string, shape> name_shapes;
+            for(const auto& smod : mod_args)
+            {
+                auto ps = smod->get_parameter_shapes();
+                name_shapes.insert(ps.begin(), ps.end());
+            }
+            bool ins_output_allocated = false;
+            for(auto& pn : name_shapes)
+            {
+                const auto& s = pn.second;
+                instruction_ref output{};
+                if(s == ins->get_shape() and not ins_output_allocated)
+                {
+                    output               = insert_allocation(ins, s);
+                    ins_output_allocated = true;
+                }
+                else
+                {
+                    output = mod->insert_instruction(ins, hip_allocate{s});
+                }
+                inputs.push_back(output);
+            }
+            return mod->replace_instruction(ins, ins->get_operator(), inputs, mod_args);
+        });
+    }
 };
-void lowering::apply(module& p) const { miopen_apply{&p, this}.apply(); }
+void lowering::apply(module& m) const { miopen_apply{&m, this}.apply(); }
 } // namespace gpu
 } // namespace MIGRAPHX_INLINE_NS
 } // namespace migraphx
--- a/src/targets/gpu/preallocate_param.cpp
+++ b/src/targets/gpu/preallocate_param.cpp
@@ -16,9 +16,9 @@ void preallocate_param::apply(module& p) const
    {
        if(ins->name() != "@param")
            continue;
-        std::string id = any_cast<builtin::param>(ins->get_operator()).parameter;
+        if(param != any_cast<builtin::param>(ins->get_operator()).parameter)
-        if(id != param)
            continue;
+        std::string id = p.name() + ":" + param;
        auto r         = p.insert_instruction(ins, hip_allocate_memory{ins->get_shape(), id});
        p.replace_instruction(ins, r);
    }

--- a/src/targets/gpu/sync_device.cpp
+++ b/src/targets/gpu/sync_device.cpp
@@ -18,7 +18,11 @@ void sync_device::apply(module& p) const
               return (i->name() == "hip::copy_from_gpu");
           }))
        {
-            p.insert_instruction(last, hip_sync_device{}, inputs);
+            auto sync_in = p.insert_instruction(last, hip_sync_device{}, inputs);
+            if(not inputs.empty())
+            {
+                p.replace_instruction(inputs.front(), sync_in);
+            }
        }
    }
 }

--- a/src/targets/gpu/write_literals.cpp
+++ b/src/targets/gpu/write_literals.cpp
@@ -28,7 +28,7 @@ void write_literals::apply(module& p) const
            }
            else
            {
-                std::string id = "@literal:" + std::to_string(n);
+                std::string id = p.name() + ":@literal:" + std::to_string(n);
                p.replace_instruction(ins, hip_copy_literal{ins->get_literal(), id});
                n++;
            }

--- a/src/targets/ref/lowering.cpp
+++ b/src/targets/ref/lowering.cpp
@@ -10,6 +10,7 @@
 #include <migraphx/op/dot.hpp>
 #include <migraphx/op/quant_dot.hpp>
 #include <migraphx/op/elu.hpp>
+#include <migraphx/op/if_op.hpp>
 #include <migraphx/op/im2col.hpp>
 #include <migraphx/op/leaky_relu.hpp>
 #include <migraphx/op/logsoftmax.hpp>
@@ -887,7 +888,7 @@ MIGRAPHX_REGISTER_OP(ref_rnn_var_sl_last_output)
 struct ref_apply
 {
-    module* modl;
+    module* mod;
    std::unordered_map<std::string, std::function<void(instruction_ref)>> apply_map{};
    template <class T>
@@ -927,7 +928,7 @@ struct ref_apply
    void apply()
    {
        init();
-        for(auto it : iterator_for(*modl))
+        for(auto it : iterator_for(*mod))
        {
            if(it->name() == "pooling")
            {
@@ -946,29 +947,29 @@ struct ref_apply
    void apply_ref_op(instruction_ref ins) const
    {
-        modl->replace_instruction(ins, ref_op{ins->get_operator()}, ins->inputs());
+        mod->replace_instruction(ins, ref_op{ins->get_operator()}, ins->inputs());
    }
    template <class T>
    void apply_simple_op(instruction_ref ins)
    {
-        modl->replace_instruction(ins, T{}, ins->inputs());
+        mod->replace_instruction(ins, T{}, ins->inputs());
    }
    template <class T, class Op>
    void apply_extend_op(instruction_ref ins)
    {
        auto&& op = any_cast<Op>(ins->get_operator());
-        modl->replace_instruction(ins, T{op}, ins->inputs());
+        mod->replace_instruction(ins, T{op}, ins->inputs());
    }
    void apply_pooling(instruction_ref ins) const
    {
        auto&& op = any_cast<op::pooling>(ins->get_operator());
        if(op.mode == "max")
-            modl->replace_instruction(ins, ref_pooling<max_pool>{op}, ins->inputs());
+            mod->replace_instruction(ins, ref_pooling<max_pool>{op}, ins->inputs());
        else if(op.mode == "average")
-            modl->replace_instruction(ins, ref_pooling<avg_pool>{op}, ins->inputs());
+            mod->replace_instruction(ins, ref_pooling<avg_pool>{op}, ins->inputs());
    }
 };

--- a/test/api/test_gpu.cpp
+++ b/test/api/test_gpu.cpp
+#include <numeric>
 #include <migraphx/migraphx.h>
 #include <migraphx/migraphx.hpp>
 #include "test.hpp"
@@ -24,4 +25,53 @@ TEST_CASE(load_and_run)
    CHECK(bool{shapes_before.front() == outputs.front().get_shape()});
 }
+TEST_CASE(if_pl_test)
+{
+    auto run_prog = [&](auto cond) {
+        auto p             = migraphx::parse_onnx("if_pl_test.onnx");
+        auto shapes_before = p.get_output_shapes();
+        migraphx_compile_options options;
+        options.offload_copy = true;
+        p.compile(migraphx::target("gpu"), options);
+        auto shapes_after = p.get_output_shapes();
+        CHECK(shapes_before.size() == 1);
+        CHECK(bool{shapes_before.front() == shapes_after.front()});
+        migraphx::program_parameters pp;
+        auto param_shapes = p.get_parameter_shapes();
+        auto xs           = param_shapes["x"];
+        std::vector<float> xd(xs.bytes() / sizeof(float), 1.0);
+        pp.add("x", migraphx::argument(xs, xd.data()));
+        auto ys = param_shapes["y"];
+        std::vector<float> yd(ys.bytes() / sizeof(float), 2.0);
+        pp.add("y", migraphx::argument(ys, yd.data()));
+        char ccond = static_cast<char>(cond);
+        pp.add("cond", migraphx::argument(param_shapes["cond"], &ccond));
+        auto outputs = p.eval(pp);
+        auto output  = outputs[0];
+        auto lens    = output.get_shape().lengths();
+        auto elem_num =
+            std::accumulate(lens.begin(), lens.end(), 1, std::multiplies<std::size_t>());
+        float* data_ptr = reinterpret_cast<float*>(output.data());
+        std::vector<float> ret(data_ptr, data_ptr + elem_num);
+        return ret;
+    };
+    // then branch
+    {
+        auto result_vector      = run_prog(true);
+        std::vector<float> gold = {2, 3, 4, 5, 6, 7};
+        EXPECT(result_vector == gold);
+    }
+    // else branch
+    {
+        auto result_vector      = run_prog(false);
+        std::vector<float> gold = {1, 2, 3, 4, 5, 6};
+        EXPECT(result_vector == gold);
+    }
+}
 int main(int argc, const char* argv[]) { test::run(argc, argv); }
--- a/test/memory_coloring_test.cpp
+++ b/test/memory_coloring_test.cpp
@@ -3,6 +3,7 @@
 #include <migraphx/check_shapes.hpp>
 #include <migraphx/generate.hpp>
 #include <migraphx/instruction.hpp>
+#include <migraphx/make_op.hpp>
 #include <basic_ops.hpp>
 #include <test.hpp>
@@ -637,6 +638,48 @@ TEST_CASE(test38)
    CHECK(no_allocate(m));
 }
+TEST_CASE(test39)
+{
+    migraphx::program p;
+    auto* mm = p.get_main_module();
+    migraphx::shape cond_s{migraphx::shape::bool_type};
+    auto cond   = add_alloc(*mm, cond_s);
+    auto output = mm->add_parameter("output", {migraphx::shape::float_type, {20}});
+    migraphx::shape ds{migraphx::shape::float_type, {2, 3}};
+    std::vector<float> data1 = {0.384804, -1.77948, -0.453775, 0.477438, -1.06333, -1.12893};
+    auto l1                  = mm->add_literal(migraphx::literal(ds, data1));
+    std::vector<float> data2 = {-0.258047, 0.360394, 0.536804, -0.577762, 1.0217, 1.02442};
+    auto l2                  = mm->add_literal(migraphx::literal(ds, data2));
+    auto* then_mod = p.create_module("If_0_if");
+    auto i1        = add_alloc(*then_mod, ds);
+    auto a1        = then_mod->add_instruction(pass_op{}, i1, l1);
+    then_mod->add_return({a1, output});
+    auto* else_mod = p.create_module("If_0_else");
+    auto i2        = add_alloc(*else_mod, ds);
+    auto a2        = else_mod->add_instruction(pass_op{}, i2, l2);
+    else_mod->add_return({a2, output});
+    auto ret = mm->add_instruction(mod_pass_op{}, {cond}, {then_mod, else_mod});
+    mm->add_return({ret, output});
+    auto sub_modules = p.get_modules();
+    std::reverse(sub_modules.begin(), sub_modules.end());
+    for(auto& smod : sub_modules)
+    {
+        run_pass(*smod);
+    }
+    CHECK(mm->get_parameter_shape("scratch").bytes() == 4);
+    CHECK(then_mod->get_parameter_shape("scratch").bytes() == 24);
+    CHECK(else_mod->get_parameter_shape("scratch").bytes() == 24);
+    CHECK(no_allocate(*mm));
+    CHECK(no_allocate(*then_mod));
+    CHECK(no_allocate(*else_mod));
+}
 TEST_CASE(literal_test)
 {
    migraphx::program p;

--- a/test/module_test.cpp
+++ b/test/module_test.cpp
@@ -2,6 +2,7 @@
 #include <migraphx/iterator_for.hpp>
 #include <migraphx/instruction.hpp>
 #include <migraphx/ref/target.hpp>
+#include <migraphx/ranges.hpp>
 #include <sstream>
 #include "test.hpp"
 #include <migraphx/make_op.hpp>
@@ -203,4 +204,51 @@ TEST_CASE(submodule_copy)
    EXPECT(mm.get_sub_modules() == mm2.get_sub_modules());
 }
+TEST_CASE(calc_implict_deps)
+{
+    migraphx::program p;
+    auto* mm = p.get_main_module();
+    migraphx::shape cond_s{migraphx::shape::bool_type};
+    migraphx::shape xs{migraphx::shape::float_type, {2, 3}};
+    migraphx::shape ys{migraphx::shape::float_type, {3, 3}};
+    std::vector<float> datax = {1, 2, 3, 4, 5, 6};
+    std::vector<float> datay = {8, 7, 6, 5, 4, 3, 2, 1, 0};
+    auto lx   = mm->add_literal(migraphx::literal(xs, datax));
+    auto ly   = mm->add_literal(migraphx::literal(ys, datay));
+    auto cond = mm->add_parameter("cond", cond_s);
+    auto x1   = mm->add_parameter("x1", xs);
+    auto x2   = mm->add_parameter("x2", xs);
+    auto y2   = mm->add_parameter("y2", ys);
+    auto* then_mod = p.create_module("If_5_if");
+    auto l1        = then_mod->add_literal(migraphx::literal(ys, datay));
+    auto a1        = then_mod->add_instruction(migraphx::make_op("add"), x1, lx);
+    then_mod->add_return({a1, l1});
+    auto* then_mod1 = p.create_module("If_6_if");
+    auto l11        = then_mod1->add_literal(migraphx::literal(ys, datay));
+    auto a11        = then_mod1->add_instruction(migraphx::make_op("add"), x2, lx);
+    then_mod1->add_return({a11, l11});
+    auto* else_mod1 = p.create_module("If_6_else");
+    auto l21        = else_mod1->add_literal(migraphx::literal(xs, datax));
+    auto a21        = else_mod1->add_instruction(migraphx::make_op("mul"), y2, ly);
+    else_mod1->add_return({l21, a21});
+    auto* else_mod = p.create_module("If_5_else");
+    auto l2        = else_mod->add_literal(migraphx::literal(ys, datay));
+    auto a2 = else_mod->add_instruction(migraphx::make_op("if"), {cond}, {then_mod1, else_mod1});
+    else_mod->add_return({a2, l2});
+    auto ret = mm->add_instruction(migraphx::make_op("if"), {cond}, {then_mod, else_mod});
+    mm->add_return({ret});
+    auto implicit_deps = mm->calc_implicit_deps();
+    EXPECT(migraphx::contains(implicit_deps, ret));
+    EXPECT(migraphx::contains(implicit_deps.at(ret), x1));
+    EXPECT(migraphx::contains(implicit_deps.at(ret), x2));
+    EXPECT(migraphx::contains(implicit_deps.at(ret), y2));
+}
 int main(int argc, const char* argv[]) { test::run(argc, argv); }
--- a/test/onnx/gen_onnx.py
+++ b/test/onnx/gen_onnx.py
@@ -1589,6 +1589,254 @@ def if_else_test():
    return ([node], [x, y], [res], [cond_tensor, xt_tensor, yt_tensor])
+@onnx_test
+def if_literal_test():
+    then_out = onnx.helper.make_tensor_value_info('then_out',
+                                                  onnx.TensorProto.FLOAT, [5])
+    else_out = onnx.helper.make_tensor_value_info('else_out',
+                                                  onnx.TensorProto.FLOAT, [5])
+    x = np.array([1, 2, 3, 4, 5]).astype(np.float32)
+    y = np.array([5, 4, 3, 2, 1]).astype(np.float32)
+    then_const_node = onnx.helper.make_node(
+        'Constant',
+        inputs=[],
+        outputs=['then_out'],
+        value=onnx.numpy_helper.from_array(x))
+    else_const_node = onnx.helper.make_node(
+        'Constant',
+        inputs=[],
+        outputs=['else_out'],
+        value=onnx.numpy_helper.from_array(y))
+    then_body = onnx.helper.make_graph([then_const_node], 'then_body', [],
+                                       [then_out])
+    else_body = onnx.helper.make_graph([else_const_node], 'else_body', [],
+                                       [else_out])
+    cond_input = onnx.helper.make_tensor_value_info('cond',
+                                                    onnx.TensorProto.BOOL, [])
+    ret = onnx.helper.make_tensor_value_info('ret', TensorProto.FLOAT, [])
+    node = onnx.helper.make_node('If',
+                                 inputs=['cond'],
+                                 outputs=['ret'],
+                                 then_branch=then_body,
+                                 else_branch=else_body)
+    return ([node], [cond_input], [ret])
+@onnx_test
+def if_param_excp_test():
+    then_out = onnx.helper.make_tensor_value_info('then_out',
+                                                  onnx.TensorProto.FLOAT,
+                                                  [2, 3])
+    else_out = onnx.helper.make_tensor_value_info('else_out',
+                                                  onnx.TensorProto.FLOAT,
+                                                  [2, 3])
+    x = onnx.helper.make_tensor_value_info('x', onnx.TensorProto.FLOAT, [2, 3])
+    y = onnx.helper.make_tensor_value_info('y', onnx.TensorProto.FLOAT, [2, 4])
+    yt = np.random.randn(2, 4).astype(np.float)
+    xt = np.random.randn(2, 3).astype(np.float)
+    xt_tensor = helper.make_tensor(name='xt',
+                                   data_type=TensorProto.FLOAT,
+                                   dims=xt.shape,
+                                   vals=xt.flatten().astype(np.float32))
+    yt_tensor = helper.make_tensor(name='yt',
+                                   data_type=TensorProto.FLOAT,
+                                   dims=yt.shape,
+                                   vals=yt.flatten().astype(np.float32))
+    then_add_node = onnx.helper.make_node('Add',
+                                          inputs=['x', 'xt'],
+                                          outputs=['then_out'])
+    else_mul_node = onnx.helper.make_node('Mul',
+                                          inputs=['y', 'yt'],
+                                          outputs=['else_out'])
+    then_body = onnx.helper.make_graph([then_add_node], 'then_body', [],
+                                       [then_out], [xt_tensor])
+    else_body = onnx.helper.make_graph([else_mul_node], 'else_body', [],
+                                       [else_out], [yt_tensor])
+    cond_input = onnx.helper.make_tensor_value_info('cond',
+                                                    onnx.TensorProto.BOOL, [])
+    ret = onnx.helper.make_tensor_value_info('ret', TensorProto.FLOAT, [])
+    node = onnx.helper.make_node('If',
+                                 inputs=['cond'],
+                                 outputs=['ret'],
+                                 then_branch=then_body,
+                                 else_branch=else_body)
+    return ([node], [cond_input, x, y], [ret])
+@onnx_test
+def if_param_excp1_test():
+    then_out = onnx.helper.make_tensor_value_info('sub_out',
+                                                  onnx.TensorProto.FLOAT,
+                                                  [2, 3])
+    x = onnx.helper.make_tensor_value_info('x', onnx.TensorProto.FLOAT, [2, 3])
+    xt = np.random.randn(2, 3).astype(np.float)
+    xt_tensor = helper.make_tensor(name='xt',
+                                   data_type=TensorProto.FLOAT,
+                                   dims=xt.shape,
+                                   vals=xt.flatten().astype(np.float32))
+    then_add_node = onnx.helper.make_node('Add',
+                                          inputs=['x', 'xt'],
+                                          outputs=['sub_out'])
+    sub_body = onnx.helper.make_graph([then_add_node], 'sub_body', [],
+                                      [then_out], [xt_tensor])
+    cond_input = onnx.helper.make_tensor_value_info('cond',
+                                                    onnx.TensorProto.BOOL, [2])
+    ret = onnx.helper.make_tensor_value_info('ret', TensorProto.FLOAT, [])
+    node = onnx.helper.make_node('If',
+                                 inputs=['cond'],
+                                 outputs=['ret'],
+                                 then_branch=sub_body,
+                                 else_branch=sub_body)
+    return ([node], [cond_input, x], [ret])
+@onnx_test
+def if_param_test():
+    then_out = onnx.helper.make_tensor_value_info('then_out',
+                                                  onnx.TensorProto.FLOAT,
+                                                  [2, 3])
+    else_out = onnx.helper.make_tensor_value_info('else_out',
+                                                  onnx.TensorProto.FLOAT,
+                                                  [2, 3])
+    x = onnx.helper.make_tensor_value_info('x', onnx.TensorProto.FLOAT, [2, 3])
+    y = onnx.helper.make_tensor_value_info('y', onnx.TensorProto.FLOAT, [2, 3])
+    yt = np.random.randn(2, 3).astype(np.float)
+    xt = np.random.randn(2, 3).astype(np.float)
+    xt_tensor = helper.make_tensor(name='xt',
+                                   data_type=TensorProto.FLOAT,
+                                   dims=xt.shape,
+                                   vals=xt.flatten().astype(np.float32))
+    yt_tensor = helper.make_tensor(name='yt',
+                                   data_type=TensorProto.FLOAT,
+                                   dims=yt.shape,
+                                   vals=yt.flatten().astype(np.float32))
+    then_add_node = onnx.helper.make_node('Add',
+                                          inputs=['x', 'xt'],
+                                          outputs=['then_out'])
+    else_mul_node = onnx.helper.make_node('Mul',
+                                          inputs=['y', 'yt'],
+                                          outputs=['else_out'])
+    then_body = onnx.helper.make_graph([then_add_node], 'then_body', [],
+                                       [then_out], [xt_tensor])
+    else_body = onnx.helper.make_graph([else_mul_node], 'else_body', [],
+                                       [else_out], [yt_tensor])
+    cond_input = onnx.helper.make_tensor_value_info('cond',
+                                                    onnx.TensorProto.BOOL, [])
+    ret = onnx.helper.make_tensor_value_info('ret', TensorProto.FLOAT, [])
+    node = onnx.helper.make_node('If',
+                                 inputs=['cond'],
+                                 outputs=['ret'],
+                                 then_branch=then_body,
+                                 else_branch=else_body)
+    return ([node], [cond_input, x, y], [ret])
+@onnx_test
+def if_pl_test():
+    out_x = onnx.helper.make_tensor_value_info('out_x', onnx.TensorProto.FLOAT,
+                                               [2, 3])
+    out_l_x = onnx.helper.make_tensor_value_info('out_l_x',
+                                                 onnx.TensorProto.FLOAT,
+                                                 [2, 3])
+    out_y = onnx.helper.make_tensor_value_info('out_y', onnx.TensorProto.FLOAT,
+                                               [3, 3])
+    out_l_y = onnx.helper.make_tensor_value_info('out_l_y',
+                                                 onnx.TensorProto.FLOAT,
+                                                 [3, 3])
+    x = onnx.helper.make_tensor_value_info('x', onnx.TensorProto.FLOAT, [2, 3])
+    y = onnx.helper.make_tensor_value_info('y', onnx.TensorProto.FLOAT, [3, 3])
+    xt = np.array([[1, 2, 3], [4, 5, 6]]).astype(np.float32)
+    yt = np.array([[8, 7, 6], [5, 4, 3], [2, 1, 0]]).astype(np.float32)
+    xt_tensor = helper.make_tensor(name='xt',
+                                   data_type=TensorProto.FLOAT,
+                                   dims=xt.shape,
+                                   vals=xt.flatten().astype(np.float32))
+    yt_tensor = helper.make_tensor(name='yt',
+                                   data_type=TensorProto.FLOAT,
+                                   dims=yt.shape,
+                                   vals=yt.flatten().astype(np.float32))
+    then_add_node = onnx.helper.make_node('Add',
+                                          inputs=['x', 'xt'],
+                                          outputs=['out_x'])
+    else_mul_node = onnx.helper.make_node('Mul',
+                                          inputs=['y', 'yt'],
+                                          outputs=['out_y'])
+    then_const_node = onnx.helper.make_node(
+        'Constant',
+        inputs=[],
+        outputs=['out_l_y'],
+        value=onnx.numpy_helper.from_array(yt))
+    else_const_node = onnx.helper.make_node(
+        'Constant',
+        inputs=[],
+        outputs=['out_l_x'],
+        value=onnx.numpy_helper.from_array(xt))
+    then_body = onnx.helper.make_graph([then_add_node, then_const_node],
+                                       'then_body', [], [out_x, out_l_y])
+    else_body = onnx.helper.make_graph([else_mul_node, else_const_node],
+                                       'else_body', [], [out_l_x, out_y])
+    cond_input = onnx.helper.make_tensor_value_info('cond',
+                                                    onnx.TensorProto.BOOL, [])
+    ret = onnx.helper.make_tensor_value_info('ret', TensorProto.FLOAT, [])
+    node = onnx.helper.make_node('If',
+                                 inputs=['cond'],
+                                 outputs=['ret'],
+                                 then_branch=then_body,
+                                 else_branch=else_body)
+    return ([node], [cond_input, x, y], [ret], [xt_tensor, yt_tensor])
 @onnx_test
 def if_then_test():
    x = onnx.helper.make_tensor_value_info('x', onnx.TensorProto.FLOAT, [2, 3])

--- a/test/onnx/if_literal_test.onnx
+++ b/test/onnx/if_literal_test.onnx
--- a/test/onnx/if_param_excp1_test.onnx
+++ b/test/onnx/if_param_excp1_test.onnx
--- a/test/onnx/if_param_excp_test.onnx
+++ b/test/onnx/if_param_excp_test.onnx
--- a/test/onnx/if_param_test.onnx
+++ b/test/onnx/if_param_test.onnx
--- a/test/onnx/if_pl_test.onnx
+++ b/test/onnx/if_pl_test.onnx
--- a/test/onnx/onnx_test.cpp
+++ b/test/onnx/onnx_test.cpp
@@ -1408,6 +1408,104 @@ TEST_CASE(if_else_test)
    EXPECT(p == prog);
 }
+TEST_CASE(if_literal_test)
+{
+    migraphx::program p;
+    auto* mm = p.get_main_module();
+    migraphx::shape cond_s{migraphx::shape::bool_type};
+    auto cond = mm->add_parameter("cond", cond_s);
+    migraphx::shape s{migraphx::shape::float_type, {5}};
+    auto* then_mod           = p.create_module("If_1_if");
+    std::vector<float> data1 = {1, 2, 3, 4, 5};
+    auto l1                  = then_mod->add_literal(migraphx::literal(s, data1));
+    then_mod->add_return({l1});
+    auto* else_mod           = p.create_module("If_1_else");
+    std::vector<float> data2 = {5, 4, 3, 2, 1};
+    auto l2                  = else_mod->add_literal(migraphx::literal(s, data2));
+    else_mod->add_return({l2});
+    auto ret = mm->add_instruction(migraphx::make_op("if"), {cond}, {then_mod, else_mod});
+    mm->add_return({ret});
+    auto prog = migraphx::parse_onnx("if_literal_test.onnx");
+    EXPECT(p == prog);
+}
+TEST_CASE(if_param_excp_test)
+{
+    EXPECT(test::throws([&] { migraphx::parse_onnx("if_param_excp_test.onnx"); }));
+}
+TEST_CASE(if_param_excp1_test)
+{
+    EXPECT(test::throws([&] { migraphx::parse_onnx("if_param_excp1_test.onnx"); }));
+}
+TEST_CASE(if_param_test)
+{
+    migraphx::program p;
+    auto* mm = p.get_main_module();
+    migraphx::shape cond_s{migraphx::shape::bool_type};
+    auto cond = mm->add_parameter("cond", cond_s);
+    migraphx::shape ds{migraphx::shape::float_type, {2, 3}};
+    auto x = mm->add_parameter("x", ds);
+    auto y = mm->add_parameter("y", ds);
+    auto* then_mod           = p.create_module("If_3_if");
+    std::vector<float> data1 = {0.384804, -1.77948, -0.453775, 0.477438, -1.06333, -1.12893};
+    auto l1                  = then_mod->add_literal(migraphx::literal(ds, data1));
+    auto a1                  = then_mod->add_instruction(migraphx::make_op("add"), x, l1);
+    then_mod->add_return({a1});
+    auto* else_mod           = p.create_module("If_3_else");
+    std::vector<float> data2 = {-0.258047, 0.360394, 0.536804, -0.577762, 1.0217, 1.02442};
+    auto l2                  = else_mod->add_literal(migraphx::literal(ds, data2));
+    auto a2                  = else_mod->add_instruction(migraphx::make_op("mul"), y, l2);
+    else_mod->add_return({a2});
+    auto ret = mm->add_instruction(migraphx::make_op("if"), {cond}, {then_mod, else_mod});
+    mm->add_return({ret});
+    auto prog = migraphx::parse_onnx("if_param_test.onnx");
+    EXPECT(p == prog);
+}
+TEST_CASE(if_pl_test)
+{
+    migraphx::program p;
+    auto* mm = p.get_main_module();
+    migraphx::shape cond_s{migraphx::shape::bool_type};
+    migraphx::shape xs{migraphx::shape::float_type, {2, 3}};
+    migraphx::shape ys{migraphx::shape::float_type, {3, 3}};
+    std::vector<float> datax = {1, 2, 3, 4, 5, 6};
+    std::vector<float> datay = {8, 7, 6, 5, 4, 3, 2, 1, 0};
+    auto lx   = mm->add_literal(migraphx::literal(xs, datax));
+    auto ly   = mm->add_literal(migraphx::literal(ys, datay));
+    auto cond = mm->add_parameter("cond", cond_s);
+    auto x    = mm->add_parameter("x", xs);
+    auto y    = mm->add_parameter("y", ys);
+    auto* then_mod = p.create_module("If_5_if");
+    auto l1        = then_mod->add_literal(migraphx::literal(ys, datay));
+    auto a1        = then_mod->add_instruction(migraphx::make_op("add"), x, lx);
+    then_mod->add_return({a1, l1});
+    auto* else_mod = p.create_module("If_5_else");
+    auto l2        = else_mod->add_literal(migraphx::literal(xs, datax));
+    auto a2        = else_mod->add_instruction(migraphx::make_op("mul"), y, ly);
+    else_mod->add_return({l2, a2});
+    auto ret = mm->add_instruction(migraphx::make_op("if"), {cond}, {then_mod, else_mod});
+    mm->add_return({ret});
+    auto prog = migraphx::parse_onnx("if_pl_test.onnx");
+    EXPECT(p == prog);
+}
 TEST_CASE(if_then_test)
 {
    migraphx::program p;

--- a/test/onnx/verify_onnx.cpp
+++ b/test/onnx/verify_onnx.cpp
@@ -87,6 +87,79 @@ TEST_CASE(if_else_test)
    EXPECT(migraphx::verify_range(result_vector, gold));
 }
+TEST_CASE(if_literal_test)
+{
+    auto run_prog = [](bool cond) {
+        migraphx::program p = migraphx::parse_onnx("if_literal_test.onnx");
+        p.compile(migraphx::ref::target{});
+        migraphx::shape s_data{migraphx::shape::bool_type};
+        std::vector<char> data = {static_cast<char>(cond)};
+        migraphx::parameter_map pp;
+        pp["cond"] = migraphx::argument(s_data, data.data());
+        auto result = p.eval(pp).back();
+        std::vector<float> result_vector;
+        result.visit([&](auto output) { result_vector.assign(output.begin(), output.end()); });
+        return result_vector;
+    };
+    // then branch
+    {
+        auto result_vector      = run_prog(true);
+        std::vector<float> gold = {1, 2, 3, 4, 5};
+        EXPECT(migraphx::verify_range(result_vector, gold));
+    }
+    // else branch
+    {
+        auto result_vector      = run_prog(false);
+        std::vector<float> gold = {5, 4, 3, 2, 1};
+        EXPECT(migraphx::verify_range(result_vector, gold));
+    }
+}
+TEST_CASE(if_pl_test)
+{
+    auto run_prog = [](bool cond) {
+        migraphx::program p = migraphx::parse_onnx("if_pl_test.onnx");
+        p.compile(migraphx::ref::target{});
+        migraphx::shape xs{migraphx::shape::float_type, {2, 3}};
+        migraphx::shape ys{migraphx::shape::float_type, {3, 3}};
+        migraphx::shape cond_s{migraphx::shape::bool_type};
+        std::vector<float> x_data(xs.elements(), 1.0f);
+        std::vector<float> y_data(ys.elements(), 2.0f);
+        std::vector<char> cond_data{static_cast<char>(cond)};
+        migraphx::parameter_map pp;
+        pp["x"]    = migraphx::argument(xs, x_data.data());
+        pp["y"]    = migraphx::argument(ys, y_data.data());
+        pp["cond"] = migraphx::argument(cond_s, cond_data.data());
+        auto result = p.eval(pp).back();
+        std::vector<float> ret;
+        result.visit([&](auto output) { ret.assign(output.begin(), output.end()); });
+        return ret;
+    };
+    // then branch
+    {
+        auto result_vector      = run_prog(true);
+        std::vector<float> gold = {2, 3, 4, 5, 6, 7};
+        EXPECT(migraphx::verify_range(result_vector, gold));
+    }
+    // else branch
+    {
+        auto result_vector      = run_prog(false);
+        std::vector<float> gold = {1, 2, 3, 4, 5, 6};
+        EXPECT(migraphx::verify_range(result_vector, gold));
+    }
+}
 TEST_CASE(instance_norm_test)
 {
    migraphx::program p = migraphx::parse_onnx("instance_norm_val_test.onnx");

--- a/test/py/test_gpu.py
+++ b/test/py/test_gpu.py
@@ -72,7 +72,29 @@ def test_fp16_imagescaler():
    print(r)
+def test_if_pl():
+    p = migraphx.parse_onnx("if_pl_test.onnx")
+    print(p)
+    s1 = p.get_output_shapes()[-1]
+    print("Compiling ...")
+    p.compile(migraphx.get_target("gpu"))
+    print(p)
+    s2 = p.get_output_shapes()[-1]
+    assert s1 == s2
+    params = {}
+    shapes = p.get_parameter_shapes()
+    params["x"] = np.ones(6).reshape(shapes["x"].lens()).astype(np.float32)
+    params["y"] = np.array([2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0
+                            ]).reshape(shapes["y"].lens()).astype(np.float32)
+    params["cond"] = np.array([1]).reshape(()).astype(np.bool)
+    r = p.run(params)[-1]
+    print(r)
 test_conv_relu()
 test_sub_uint64()
 test_neg_int64()
 test_fp16_imagescaler()
+test_if_pl()
--- a/test/ref_ops_test.cpp
+++ b/test/ref_ops_test.cpp