Merge branch 'develop' into doc-standard

5907abd8 · Sam Wu · cb6563d0 · e3e00547 · 5907abd8 · 5907abd8
Commit 5907abd8 authored Dec 05, 2023 by Sam Wu
20 changed files
--- a/docs/dev/data.rst
+++ b/docs/dev/data.rst
@@ -5,26 +5,36 @@ shape
 -----
 .. doxygenstruct:: migraphx::internal::shape
+   :members:
+   :undoc-members:
 literal
 -------
 .. doxygenstruct:: migraphx::internal::literal
+   :members:
+   :undoc-members:
 argument
 --------
 .. doxygenstruct:: migraphx::internal::argument
+   :members:
+   :undoc-members:
 raw_data
 --------
 .. doxygenstruct:: migraphx::internal::raw_data
+   :members:
+   :undoc-members:
-.. doxygenfunction:: migraphx::internal::visit_all
+.. doxygenfunction:: template<class T, class ...Ts> auto migraphx::internal::visit_all(T &&x, Ts&&... xs)
 tensor_view
 -----------
 .. doxygenstruct:: migraphx::internal::tensor_view
+   :members:
+   :undoc-members:
--- a/docs/dev/dev_intro.rst
+++ b/docs/dev/dev_intro.rst
@@ -18,8 +18,8 @@ Directions for building MIGraphX from source can be found in the main README fil
 Adding Two Literals
 --------------------
-A program is a collection of modules, which are collections of instructions to be executed when calling `eval <migraphx::program::eval>`.
+A program is a collection of modules, which are collections of instructions to be executed when calling :cpp:any:`eval <migraphx::internal::program::eval>`.
-Each instruction has an associated `operation <migraphx::operation>` which represents the computation to be performed by the instruction.
+Each instruction has an associated :cpp:any:`operation <migraphx::internal::operation>` which represents the computation to be performed by the instruction.
 We start with a snippet of the simple ``add_two_literals()`` function::
@@ -41,14 +41,14 @@ We start with a snippet of the simple ``add_two_literals()`` function::
    auto result = p.eval({}).back();
    std::cout << "add_two_literals: 1 + 2 = " << result << "\n";
-We start by creating a simple ``migraphx::program`` object and then getting a pointer to the main module of it.
+We start by creating a simple :cpp:any:`migraphx::program <migraphx::internal::program>` object and then getting a pointer to the main module of it.
 The program is a collection of ``modules`` that start executing from the main module, so instructions are added to the modules rather than directly onto the program object.
-We then use the `add_literal <migraphx::program::add_literal>` function to add an instruction that stores the literal number ``1`` while returning an `instruction_ref <migraphx::instruction_ref>`.
+We then use the :cpp:any:`add_literal <migraphx::internal::program::add_literal>` function to add an instruction that stores the literal number ``1`` while returning an :cpp:any:`instruction_ref <migraphx::internal::instruction_ref>`.
-The returned `instruction_ref <migraphx::instruction_ref>` can be used in another instruction as an input.
+The returned :cpp:any:`instruction_ref <migraphx::internal::instruction_ref>` can be used in another instruction as an input.
-We use the same `add_literal <migraphx::program::add_literal>` function to add a ``2`` to the program.
+We use the same :cpp:any:`add_literal <migraphx::internal::program::add_literal>` function to add a ``2`` to the program.
 After creating the literals, we then create the instruction to add the numbers together.
-This is done by using the `add_instruction <migraphx::program::add_instruction>` function with the ``"add"`` `operation <migraphx::program::operation>` created by `make_op <migraphx::program::make_op>` along with the previous `add_literal` `instruction_ref <migraphx::instruction_ref>` for the input arguments of the instruction.
+This is done by using the :cpp:any:`add_instruction <migraphx::internal::program::add_instruction>` function with the ``"add"`` :cpp:any:`operation <migraphx::internal::program::operation>` created by :cpp:any:`make_op <migraphx::internal::program::make_op>` along with the previous `add_literal` :cpp:any:`instruction_ref <migraphx::internal::instruction_ref>` for the input arguments of the instruction.
-Finally, we can run this `program <migraphx::program>` by compiling it for the reference target (CPU) and then running it with `eval <migraphx::program::eval>`
+Finally, we can run this :cpp:any:`program <migraphx::internal::program>` by compiling it for the reference target (CPU) and then running it with :cpp:any:`eval <migraphx::internal::program::eval>`
 The result is then retreived and printed to the console.
 We can compile the program for the GPU as well, but the file will have to be moved to the ``test/gpu/`` directory and the correct target must be included::
@@ -76,8 +76,8 @@ We can modify the program to take an input parameter ``x``, as seen in the ``add
    p.compile(migraphx::ref::target{});
 This adds a parameter of type ``int32``, and compiles it for the CPU.
-To run the program, we need to pass the parameter as a ``parameter_map`` when we call `eval <migraphx::program::eval>`.
+To run the program, we need to pass the parameter as a ``parameter_map`` when we call :cpp:any:`eval <migraphx::internal::program::eval>`.
-We create the ``parameter_map`` by setting the ``x`` key to an `argument <migraphx::argument>` object with an ``int`` data type::
+We create the ``parameter_map`` by setting the ``x`` key to an :cpp:any:`argument <migraphx::internal::argument>` object with an ``int`` data type::
    // create a parameter_map object for passing a value to the "x" parameter
    std::vector<int> data = {4};
@@ -92,7 +92,7 @@ We create the ``parameter_map`` by setting the ``x`` key to an `argument <migrap
 Handling Tensor Data
 ---------------------
-In the previous examples we have only been dealing with scalars, but the `shape <migraphx::shape>` class can describe multi-dimensional tensors.
+In the previous examples we have only been dealing with scalars, but the :cpp:any:`shape <migraphx::internal::shape>` class can describe multi-dimensional tensors.
 For example, we can compute a simple convolution::
    migraphx::program p;
@@ -109,7 +109,7 @@ For example, we can compute a simple convolution::
 Here we create two parameters for both the ``input`` and ``weights``.
 In the previous examples, we created simple literals, however, most programs will take data from allocated buffers (usually on the GPU).
-In this case, we can create `argument <migraphx::argument>` objects directly from the pointers to the buffers::
+In this case, we can create :cpp:any:`argument <migraphx::internal::argument>` objects directly from the pointers to the buffers::
    // Compile the program
    p.compile(migraphx::ref::target{});
@@ -133,8 +133,8 @@ In this case, we can create `argument <migraphx::argument>` objects directly fro
    EXPECT(migraphx::verify::verify_rms_range(results_vector, sol));
-An `argument <migraphx::argument>` can handle memory buffers from either the GPU or the CPU.
+An :cpp:any:`argument <migraphx::internal::argument>` can handle memory buffers from either the GPU or the CPU.
-By default when running the `program <migraphx::program>`, buffers are allocated on the corresponding target.
+By default when running the :cpp:any:`program <migraphx::internal::program>`, buffers are allocated on the corresponding target.
 When compiling for the CPU, the buffers by default will be allocated on the CPU.
 When compiling for the GPU, the buffers by default will be allocated on the GPU.
 With the option ``offload_copy=true`` set while compiling for the GPU, the buffers will be located on the CPU.
@@ -143,7 +143,7 @@ With the option ``offload_copy=true`` set while compiling for the GPU, the buffe
 Importing From ONNX
 --------------------
-A `program <migraphx::program>` can be built directly from an onnx file using the MIGraphX ONNX parser.
+A :cpp:any:`program <migraphx::internal::program>` can be built directly from an onnx file using the MIGraphX ONNX parser.
 This makes it easier to use neural networks directly from other frameworks.
 In this case, there is an ``parse_onnx`` function::

--- a/docs/dev/operators.rst
+++ b/docs/dev/operators.rst
@@ -5,6 +5,8 @@ operation
 ---------
 .. doxygenstruct:: migraphx::internal::operation
+   :members:
+   :undoc-members:
 .. doxygenfunction:: migraphx::internal::is_context_free
@@ -14,3 +16,5 @@ operators
 ---------
 .. doxygennamespace:: migraphx::internal::op
+   :members:
+   :undoc-members:
--- a/docs/dev/pass.rst
+++ b/docs/dev/pass.rst
@@ -5,63 +5,82 @@ pass
 ----
 .. doxygenstruct:: migraphx::internal::pass
+   :members:
+   :undoc-members:
 dead_code_elimination
 ---------------------
 .. doxygenstruct:: migraphx::internal::dead_code_elimination
+   :members:
+   :undoc-members:
 eliminate_common_subexpression
 ------------------------------
 .. doxygenstruct:: migraphx::internal::eliminate_common_subexpression
+   :members:
+   :undoc-members:
 eliminate_concat
 ----------------
 .. doxygenstruct:: migraphx::internal::eliminate_concat
+   :members:
+   :undoc-members:
 eliminate_contiguous
 --------------------
 .. doxygenstruct:: migraphx::internal::eliminate_contiguous
+   :members:
+   :undoc-members:
 eliminate_identity
 ------------------
 .. doxygenstruct:: migraphx::internal::eliminate_identity
+   :members:
+   :undoc-members:
 eliminate_pad
 -------------
 .. doxygenstruct:: migraphx::internal::eliminate_pad
+   :members:
+   :undoc-members:
 propagate_constant
 ------------------
 .. doxygenstruct:: migraphx::internal::propagate_constant
+   :members:
-rewrite_batchnorm
+   :undoc-members:
-----------------
-.. doxygenstruct:: migraphx::internal::rewrite_batchnorm
 rewrite_rnn
 -----------
 .. doxygenstruct:: migraphx::internal::rewrite_rnn
+   :members:
+   :undoc-members:
 schedule
 --------
 .. doxygenstruct:: migraphx::internal::schedule
+   :members:
+   :undoc-members:
 simplify_algebra
 ----------------
 .. doxygenstruct:: migraphx::internal::simplify_algebra
+   :members:
+   :undoc-members:
 simplify_reshapes
 -----------------
 .. doxygenstruct:: migraphx::internal::simplify_reshapes
+   :members:
+   :undoc-members:
--- a/docs/dev/program.rst
+++ b/docs/dev/program.rst
@@ -5,6 +5,8 @@ instruction
 -----------
 .. doxygenstruct:: migraphx::internal::instruction
+   :members:
+   :undoc-members:
 instruction_ref
 ---------------
@@ -17,6 +19,8 @@ program
 -------
 .. doxygenstruct:: migraphx::internal::program
+   :members:
+   :undoc-members:
 parse_onnx
 ----------

--- a/docs/dev/targets.rst
+++ b/docs/dev/targets.rst
@@ -5,14 +5,20 @@ target
 ------
 .. doxygenstruct:: migraphx::internal::target
+   :members:
+   :undoc-members:
 gpu::target
 -----------
 .. doxygenstruct:: migraphx::internal::gpu::target
+   :members:
+   :undoc-members:
 cpu::target
 -----------
 .. doxygenstruct:: migraphx::internal::cpu::target
+   :members:
+   :undoc-members:
--- a/docs/doxygen/Doxyfile
+++ b/docs/doxygen/Doxyfile
@@ -28,7 +28,14 @@ MACRO_EXPANSION = YES
 OUTPUT_DIRECTORY = ../_build/doxygen
-PREDEFINED = DOXYGEN
+PREDEFINED = \
+    DOXYGEN \
+    MIGRAPHX_EXPORT= \
+    MIGRAPHX_API_EXPORT= \
+    MIGRAPHX_GPU_EXPORT= \
+    MIGRAPHX_CPU_EXPORT= \
+    MIGRAPHX_ONNX_EXPORT= \
+    MIGRAPHX_TF_EXPORT= \
 PROJECT_NAME = MIGraphX

--- a/docs/reference/cpp.rst
+++ b/docs/reference/cpp.rst
@@ -8,45 +8,65 @@ shape
 .. doxygenenum:: migraphx_shape_datatype_t
 .. doxygenstruct:: migraphx::shape
+   :members:
+   :undoc-members:
 argument
 --------
 .. doxygenstruct:: migraphx::argument
+   :members:
+   :undoc-members:
 target
 ------
 .. doxygenstruct:: migraphx::target
+   :members:
+   :undoc-members:
 program
 -------
 .. doxygenstruct:: migraphx::program_parameter_shapes
+   :members:
+   :undoc-members:
 .. doxygenstruct:: migraphx::program_parameters
+   :members:
+   :undoc-members:
 .. doxygenstruct:: migraphx_compile_options
+   :members:
+   :undoc-members:
 .. doxygenstruct:: migraphx::program
+   :members:
+   :undoc-members:
 quantize
 --------
 .. doxygenstruct:: migraphx::quantize_op_names
+   :members:
+   :undoc-members:
 .. doxygenfunction:: migraphx::quantize_fp16(const program&)
 .. doxygenfunction:: migraphx::quantize_fp16(const program&, const quantize_op_names&)
 .. doxygenstruct:: migraphx::quantize_int8_options
+   :members:
+   :undoc-members:
-.. doxygenfunction:: migraphx::quantize_int8
+.. doxygenfunction::migraphx::quantize_int8
 parse_onnx
 ----------
 .. doxygenstruct:: migraphx::onnx_options
+   :members:
+   :undoc-members:
 .. doxygenfunction:: migraphx::parse_onnx(const char *)
@@ -63,16 +83,18 @@ parse_onnx
 load
 ----
-.. doxygenstruct:: migraphx_file_options
+.. doxygenstruct:: migraphx::file_options
+   :members:
+   :undoc-members:
 .. doxygenfunction:: migraphx::load(const char *)
-.. doxygenfunction:: migraphx::load(const char *, migraphx_file_options)
+.. doxygenfunction:: migraphx::load(const char *, const file_options&)
 save
 ----
 .. doxygenfunction:: migraphx::save(const program&, const char *)
-.. doxygenfunction:: migraphx::save(const program&, const char *, migraphx_file_options)
+.. doxygenfunction:: migraphx::save(const program&, const char *, const file_options&)
--- a/requirements.txt
+++ b/requirements.txt
@@ -29,4 +29,4 @@ pybind/pybind11@d159a563383d10c821ba7b2a71905d1207db6de4 --build
 msgpack/msgpack-c@cpp-3.3.0 -DMSGPACK_BUILD_TESTS=Off
 sqlite3@3.43.2 -DCMAKE_POSITION_INDEPENDENT_CODE=On
 ROCmSoftwarePlatform/composable_kernel@70eefcf4f263aa5c25f3c9ff0db8f6f199ef0fb9 -DCK_BUILD_JIT_LIB=On -DCMAKE_POSITION_INDEPENDENT_CODE=On
-ROCmSoftwarePlatform/rocMLIR@9e66e8050209f03349a41b6b497f0da2b285a53b -DBUILD_FAT_LIBROCKCOMPILER=On
+ROCmSoftwarePlatform/rocMLIR@a6880f1e6daec99876cd6a4820fbc69c57216401 -DBUILD_FAT_LIBROCKCOMPILER=On
--- a/src/onnx/onnx.proto
+++ b/src/onnx/onnx.proto
@@ -3,8 +3,8 @@
 //
-// Copyright (c) ONNX Project Contributors.
+// SPDX-License-Identifier: Apache-2.0
-// Licensed under the MIT license.
 syntax = "proto2";
@@ -27,13 +27,6 @@ package onnx_for_migraphx;
 // Notes
 //
-// Release
-//
-// We are still in the very early stage of defining ONNX. The current
-// version of ONNX is a starting point. While we are actively working
-// towards a complete spec, we would like to get the community involved
-// by sharing our working version of ONNX.
-//
 // Protobuf compatibility
 //
 // To simplify framework compatibility, ONNX is defined using the subset of protobuf
@@ -92,15 +85,28 @@ enum Version {
  //   - Add sparse initializers
  IR_VERSION_2019_9_19 = 0x0000000000000006;
-  // IR VERSION 7 published on <TBD>
+  // IR VERSION 7 published on May 8, 2020
+  // - Add support to allow function body graph to rely on multiple external opreator sets.
  // - Add a list to promote inference graph's initializers to global and
  //   mutable variables. Global variables are visible in all graphs of the
  //   stored models.
  // - Add message TrainingInfoProto to store initialization
  //   method and training algorithm. The execution of TrainingInfoProto
  //   can modify the values of mutable variables.
-  // - Make inference graph callable from TrainingInfoProto via GraphCall operator.
+  // - Implicitly add inference graph into each TrainingInfoProto's algorithm.
-  IR_VERSION = 0x0000000000000007;
+  IR_VERSION_2020_5_8 = 0x0000000000000007;
+  // IR VERSION 8 published on July 30, 2021
+  // Introduce TypeProto.SparseTensor
+  // Introduce TypeProto.Optional
+  // Added a list of FunctionProtos local to the model
+  // Deprecated since_version and operator status from FunctionProto
+  IR_VERSION_2021_7_30 = 0x0000000000000008;
+  // IR VERSION 9 published on TBD
+  // Added AttributeProto to FunctionProto so that default attribute values can be set.
+  // Added FLOAT8E4M3FN, FLOAT8E4M3FNUZ, FLOAT8E5M2, FLOAT8E5M2FNUZ.
+  IR_VERSION = 0x0000000000000009;
 }
 // Attributes
@@ -121,6 +127,7 @@ message AttributeProto {
    TENSOR = 4;
    GRAPH = 5;
    SPARSE_TENSOR = 11;
+    TYPE_PROTO = 13;
    FLOATS = 6;
    INTS = 7;
@@ -128,6 +135,7 @@ message AttributeProto {
    TENSORS = 9;
    GRAPHS = 10;
    SPARSE_TENSORS = 12;
+    TYPE_PROTOS = 14;
  }
  // The name field MUST be present for this version of the IR.
@@ -159,6 +167,7 @@ message AttributeProto {
  optional SparseTensorProto sparse_tensor = 22;  // sparse tensor value
  // Do not use field below, it's deprecated.
  // optional ValueProto v = 12;         // value - subsumes everything but graph
+  optional TypeProto tp = 14;          // type proto
  repeated float floats = 7;          // list of floats
  repeated int64 ints = 8;            // list of ints
@@ -166,6 +175,7 @@ message AttributeProto {
  repeated TensorProto tensors = 10;  // list of tensors
  repeated GraphProto graphs = 11;    // list of graph
  repeated SparseTensorProto sparse_tensors = 23; // list of sparse tensors
+  repeated TypeProto type_protos = 15;// list of type protos
 }
 // Defines information on value, including the name, the type, and
@@ -211,7 +221,7 @@ message NodeProto {
 // TrainingInfoProto stores information for training a model.
 // In particular, this defines two functionalities: an initialization-step
 // and a training-algorithm-step. Initialization resets the model
-// back to its original state as if no training has been consumed.
+// back to its original state as if no training has been performed.
 // Training algorithm improves the model based on input data.
 //
 // The semantics of the initialization-step is that the initializers
@@ -224,8 +234,8 @@ message NodeProto {
 // training algorithm's step. After the execution of a
 // TrainingInfoProto.algorithm, the initializers specified by "update_binding"
 // may be immediately updated. If the targeted training algorithm contains
-// consecutive update stages (such as block coordinate descent methods),
+// consecutive update steps (such as block coordinate descent methods),
-// the user needs to create a TrainingInfoProto for each stage.
+// the user needs to create a TrainingInfoProto for each step.
 message TrainingInfoProto {
  // This field describes a graph to compute the initial tensors
  // upon starting the training process. Initialization graph has no input
@@ -239,20 +249,38 @@ message TrainingInfoProto {
  // iteration to zero.
  //
  // By default, this field is an empty graph and its evaluation does not
-  // produce any output.
+  // produce any output. Thus, no initializer would be changed by default.
  optional GraphProto initialization = 1;
  // This field represents a training algorithm step. Given required inputs,
  // it computes outputs to update initializers in its own or inference graph's
-  // initializer lists. In general, this graph contains loss node, gradient node,
+  // initializer lists. In general, this field contains loss node, gradient node,
-  // optimizer node, increment of iteration count, and some calls to the inference
+  // optimizer node, increment of iteration count.
-  // graph.
  //
-  // The field algorithm.node is the only place the user can use GraphCall
+  // An execution of the training algorithm step is performed by executing the
-  // operator. The only callable graph is the one stored in ModelProto.graph.
+  // graph obtained by combining the inference graph (namely "ModelProto.graph")
+  // and the "algorithm" graph. That is, the actual the actual
+  // input/initializer/output/node/value_info/sparse_initializer list of
+  // the training graph is the concatenation of
+  // "ModelProto.graph.input/initializer/output/node/value_info/sparse_initializer"
+  // and "algorithm.input/initializer/output/node/value_info/sparse_initializer"
+  // in that order. This combined graph must satisfy the normal ONNX conditions.
+  // Now, let's provide a visualization of graph combination for clarity.
+  // Let the inference graph (i.e., "ModelProto.graph") be
+  //    tensor_a, tensor_b -> MatMul -> tensor_c -> Sigmoid -> tensor_d
+  // and the "algorithm" graph be
+  //    tensor_d -> Add -> tensor_e
+  // The combination process results
+  //    tensor_a, tensor_b -> MatMul -> tensor_c -> Sigmoid -> tensor_d -> Add -> tensor_e
+  //
+  // Notice that an input of a node in the "algorithm" graph may reference the
+  // output of a node in the inference graph (but not the other way round). Also, inference
+  // node cannot reference inputs of "algorithm". With these restrictions, inference graph
+  // can always be run independently without training information.
  //
  // By default, this field is an empty graph and its evaluation does not
-  // produce any output.
+  // produce any output. Evaluating the default training step never
+  // update any initializers.
  optional GraphProto algorithm = 2;
  // This field specifies the bindings from the outputs of "initialization" to
@@ -284,23 +312,16 @@ message TrainingInfoProto {
  // be multiple key-value pairs in "update_binding".
  //
  // The initializers appears as keys in "update_binding" are considered
-  // mutable and globally-visible variables. This implies some behaviors
+  // mutable variables. This implies some behaviors
  // as described below.
  //
-  //  1. We have only unique keys in all "update_binding"s so that two global
+  //  1. We have only unique keys in all "update_binding"s so that two
  //     variables may not have the same name. This ensures that one
-  //     global variable is assigned up to once.
+  //     variable is assigned up to once.
  //  2. The keys must appear in names of "ModelProto.graph.initializer" or
  //     "TrainingInfoProto.algorithm.initializer".
-  //  3. The values must be output names of "algorithm".
+  //  3. The values must be output names of "algorithm" or "ModelProto.graph.output".
-  //  4. If an optional input of a graph is omitted when using GraphCall, the
+  //  4. Mutable variables are initialized to the value specified by the
-  //     global variable with the same name may be used.
-  //  5. When using GraphCall, the users always can pass values to optional 
-  //     inputs of the called graph even if the associated initializers appears
-  //     as keys in "update_binding"s.
-  //  6. The graphs in TrainingInfoProto's can use global variables as
-  //     their operator inputs.
-  //  7. Mutable variables are initialized to the value specified by the
  //     corresponding initializer, and then potentially updated by
  //     "initializer_binding"s and "update_binding"s in "TrainingInfoProto"s.
  //
@@ -375,13 +396,31 @@ message ModelProto {
  //
  // If this field is empty, the training behavior of the model is undefined.
  repeated TrainingInfoProto training_info = 20;
+  // A list of function protos local to the model.
+  //
+  // Name of the function "FunctionProto.name" should be unique within the domain "FunctionProto.domain".
+  // In case of any conflicts the behavior (whether the model local functions are given higher priority,
+  // or standard opserator sets are given higher priotity or this is treated as error) is defined by
+  // the runtimes.
+  //
+  // The operator sets imported by FunctionProto should be compatible with the ones
+  // imported by ModelProto and other model local FunctionProtos.
+  // Example, if same operator set say 'A' is imported by a FunctionProto and ModelProto
+  // or by 2 FunctionProtos then versions for the operator set may be different but,
+  // the operator schema returned for op_type, domain, version combination
+  // for both the versions should be same for every node in the function body.
+  //
+  // One FunctionProto can reference other FunctionProto in the model, however, recursive reference
+  // is not allowed.
+  repeated FunctionProto functions = 25;
 };
 // StringStringEntryProto follows the pattern for cross-proto-version maps.
 // See https://developers.google.com/protocol-buffers/docs/proto3#maps
 message StringStringEntryProto {
  optional string key = 1;
-  optional string value= 2;
+  optional string value = 2;
 };
 message TensorAnnotation {
@@ -409,8 +448,9 @@ message GraphProto {
  optional string name = 2;   // namespace Graph
  // A list of named tensor values, used to specify constant inputs of the graph.
-  // Each TensorProto entry must have a distinct name (within the list) that
+  // Each initializer (both TensorProto as well SparseTensorProto) MUST have a name.
-  // MAY also appear in the input list.
+  // The name MUST be unique across both initializer and sparse_initializer,
+  // but the name MAY also appear in the input list.
  repeated TensorProto initializer = 5;
  // Initializers (see above) stored in sparse format.
@@ -433,13 +473,8 @@ message GraphProto {
  // which means, tensor 'a_scale' and tensor 'a_zero_point' are scale and zero point of tensor 'a' in the model.
  repeated TensorAnnotation quantization_annotation = 14;
-  // DO NOT USE the following fields, they were deprecated from earlier versions.
+  reserved 3, 4, 6 to 9;
-  // repeated string input = 3;
+  reserved "ir_version", "producer_version", "producer_tag", "domain";
-  // repeated string output = 4;
-  // optional int64 ir_version = 6;
-  // optional int64 producer_version = 7;
-  // optional string producer_tag = 8;
-  // optional string domain = 9;
 }
 // Tensors
@@ -474,6 +509,17 @@ message TensorProto {
    // This format has 1 sign bit, 8 exponent bits, and 7 mantissa bits.
    BFLOAT16 = 16;
+    // Non-IEEE floating-point format based on papers
+    // FP8 Formats for Deep Learning, https://arxiv.org/abs/2209.05433,
+    // 8-bit Numerical Formats For Deep Neural Networks, https://arxiv.org/pdf/2206.02915.pdf.
+    // Operators supported FP8 are Cast, CastLike, QuantizeLinear, DequantizeLinear.
+    // The computation usually happens inside a block quantize / dequantize
+    // fused by the runtime.
+    FLOAT8E4M3FN = 17;    // float 8, mostly used for coefficients, supports nan, not inf 
+    FLOAT8E4M3FNUZ = 18;  // float 8, mostly used for coefficients, supports nan, not inf, no negative zero 
+    FLOAT8E5M2 = 19;      // follows IEEE 754, supports nan, inf, mostly used for gradients
+    FLOAT8E5M2FNUZ = 20;  // follows IEEE 754, supports nan, inf, mostly used for gradients, no negative zero
    // Future extensions go here.
  }
@@ -507,11 +553,11 @@ message TensorProto {
  // When this field is present, the data_type field MUST be FLOAT or COMPLEX64.
  repeated float float_data = 4 [packed = true];
-  // For int32, uint8, int8, uint16, int16, bool, and float16 values
+  // For int32, uint8, int8, uint16, int16, bool, float8, and float16 values
-  // float16 values must be bit-wise converted to an uint16_t prior
+  // float16 and float8 values must be bit-wise converted to an uint16_t prior
  // to writing to the buffer.
  // When this field is present, the data_type field MUST be
-  // INT32, INT16, INT8, UINT16, UINT8, BOOL, or FLOAT16
+  // INT32, INT16, INT8, UINT16, UINT8, BOOL, FLOAT16, BFLOAT16, FLOAT8E4M3FN, FLOAT8E4M3FNUZ, FLOAT8E5M2, FLOAT8E5M2FNUZ
  repeated int32 int32_data = 5 [packed = true];
  // For strings.
@@ -589,6 +635,8 @@ message TensorProto {
 message SparseTensorProto {
  // The sequence of non-default values are encoded as a tensor of shape [NNZ].
  // The default-value is zero for numeric tensors, and empty-string for string tensors.
+  // values must have a non-empty name present which serves as a name for SparseTensorProto
+  // when used in sparse_initializer list.
  optional TensorProto values = 1;
  // The indices of the non-default values, which may be stored in one of two formats.
@@ -619,7 +667,7 @@ message TensorShapeProto {
    // Standard denotation can optionally be used to denote tensor
    // dimensions with standard semantic descriptions to ensure
    // that operations are applied to the correct axis of a tensor.
-    // Refer to https://github.com/onnx/onnx/blob/master/docs/DimensionDenotation.md#denotation-definition
+    // Refer to https://github.com/onnx/onnx/blob/main/docs/DimensionDenotation.md#denotation-definition
    // for pre-defined dimension denotations.
    optional string denotation = 3;
  };
@@ -656,6 +704,23 @@ message TypeProto {
    optional TypeProto value_type = 2;
  };
+  // wrapper for Tensor, Sequence, or Map
+  message Optional {
+    // The type and optional shape of the element wrapped.
+    // This field MUST be present for this version of the IR.
+    // Possible values correspond to OptionalProto.DataType enum
+    optional TypeProto elem_type = 1;
+  };
+  message SparseTensor {
+    // This field MUST NOT have the value of UNDEFINED
+    // This field MUST have a valid TensorProto.DataType value
+    // This field MUST be present for this version of the IR.
+    optional int32 elem_type = 1;
+    optional TensorShapeProto shape = 2;
+  }
  oneof value {
    // The type of a tensor.
@@ -672,11 +737,18 @@ message TypeProto {
    // The type of a map.
    Map map_type = 5;
+    // The type of an optional.
+    Optional optional_type = 9;
+    // Type of the sparse tensor
+    SparseTensor sparse_tensor_type = 8;
  }
  // An optional denotation can be used to denote the whole
  // type with a standard semantic description as to what is
-  // stored inside. Refer to https://github.com/onnx/onnx/blob/master/docs/TypeDenotation.md#type-denotation-definition
+  // stored inside. Refer to https://github.com/onnx/onnx/blob/main/docs/TypeDenotation.md#type-denotation-definition
  // for pre-defined type denotations.
  optional string denotation = 6;
 }
@@ -696,7 +768,67 @@ message OperatorSetIdProto {
  optional int64 version = 2;
 }
+// Operator/function status.
+enum OperatorStatus {
+    EXPERIMENTAL = 0;
+    STABLE = 1;
+}
+message FunctionProto {
+  // The name of the function, similar usage of op_type in OperatorProto.
+  // Combined with FunctionProto.domain, this forms the unique identity of
+  // the FunctionProto.
+  optional string name = 1;
+  // Deprecated since IR Version 8
+  // optional int64 since_version = 2;
+  reserved 2;
+  reserved "since_version";
+  // Deprecated since IR Version 8
+  // optional OperatorStatus status = 3;
+  reserved 3;
+  reserved "status";
+  // The inputs and outputs of the function.
+  repeated string input = 4;
+  repeated string output = 5;
+  // The attribute parameters of the function.
+  // It is for function parameters without default values.
+  repeated string attribute = 6;
+  // The attribute protos of the function.
+  // It is for function attributes with default values.
+  // A function attribute shall be represented either as
+  // a string attribute or an AttributeProto, not both.
+  repeated AttributeProto attribute_proto = 11;
+  // The nodes in the function.
+  repeated NodeProto node = 7;
+  // A human-readable documentation for this function. Markdown is allowed.
+  optional string doc_string = 8;
+  // The OperatorSets this function body (graph) relies on.
+  //
+  // All nodes in the function body (graph) will bind against the operator
+  // with the same-domain/same-op_type operator with the HIGHEST version
+  // in the referenced operator sets. This means at most one version can be relied
+  // for one domain.
+  //
+  // The operator sets imported by FunctionProto should be compatible with the ones
+  // imported by ModelProto. Example, if same operator set say 'A' is imported by FunctionProto
+  // and ModelProto then versions for the operator set may be different but,
+  // the operator schema returned for op_type, domain, version combination
+  // for both the versions should be same.
+  repeated OperatorSetIdProto opset_import = 9;
+  // The domain which this function belongs to. Combined with FunctionProto.name, this forms the unique identity of
+  // the FunctionProto.
+  optional string domain = 10;
+}
 // For using protobuf-lite
 option optimize_for = LITE_RUNTIME;
\ No newline at end of file
--- a/src/onnx/onnx_parser.cpp
+++ b/src/onnx/onnx_parser.cpp
@@ -34,7 +34,9 @@
 #include <migraphx/file_buffer.hpp>
 #include <migraphx/filesystem.hpp>
 #include <migraphx/op/unknown.hpp>
+#include <migraphx/float8.hpp>
 #include <migraphx/env.hpp>
+#include <onnx.pb.h>
 namespace migraphx {
 inline namespace MIGRAPHX_INLINE_NS {
@@ -484,6 +486,8 @@ literal onnx_parser::parse_value(const onnx::AttributeProto& attr) const
    case onnx::AttributeProto::TENSORS:
    case onnx::AttributeProto::SPARSE_TENSOR:
    case onnx::AttributeProto::SPARSE_TENSORS:
+    case onnx::AttributeProto::TYPE_PROTOS:
+    case onnx::AttributeProto::TYPE_PROTO:
    case onnx::AttributeProto::GRAPHS: return {};
    }
    MIGRAPHX_THROW("PARSE_VALUE: Invalid attribute type " + std::to_string(attr.type()));
@@ -545,6 +549,18 @@ literal onnx_parser::parse_tensor(const onnx::TensorProto& t) const
    case onnx::TensorProto::DOUBLE:
        return create_literal(shape::double_type, dims, t.double_data());
    case onnx::TensorProto::FLOAT: return create_literal(shape::float_type, dims, t.float_data());
+    case onnx::TensorProto::FLOAT8E4M3FNUZ: {
+        std::vector<int32_t> data_int32(t.int32_data().begin(), t.int32_data().end());
+        std::vector<migraphx::fp8::fp8e4m3fnuz> data_fp8;
+        std::transform(data_int32.begin(),
+                       data_int32.end(),
+                       std::back_inserter(data_fp8),
+                       [](float raw_val) { return migraphx::fp8::fp8e4m3fnuz{raw_val}; });
+        return create_literal(shape::fp8e4m3fnuz_type, dims, data_fp8);
+    }
+    case onnx::TensorProto::FLOAT8E5M2FNUZ:
+    case onnx::TensorProto::FLOAT8E5M2:
+    case onnx::TensorProto::FLOAT8E4M3FN:
    case onnx::TensorProto::UNDEFINED:
    case onnx::TensorProto::STRING:
    case onnx::TensorProto::COMPLEX64:
@@ -609,6 +625,13 @@ shape::type_t get_type(int dtype)
    case 11: return shape::double_type;
    case 12: return shape::uint32_type;
    case 13: return shape::uint64_type;
+    case 18: return shape::fp8e4m3fnuz_type;
+    case 14:
+    case 15:
+    case 16:
+    case 17:
+    case 19:
+    case 20:
    default: {
        MIGRAPHX_THROW("Prototensor data type " + std::to_string(dtype) + " not supported");
    }

--- a/src/targets/gpu/CMakeLists.txt
+++ b/src/targets/gpu/CMakeLists.txt
@@ -126,7 +126,6 @@ add_library(migraphx_gpu
    fuse_ck.cpp
    fuse_mlir.cpp
    fuse_ops.cpp
-    gather.cpp
    gemm_impl.cpp
    hip.cpp
    kernel.cpp
@@ -140,7 +139,6 @@ add_library(migraphx_gpu
    nonzero.cpp
    pack_args.cpp
    prefuse_ops.cpp
-    pad.cpp
    perfdb.cpp
    pooling.cpp
    reverse.cpp
@@ -168,12 +166,10 @@ endfunction()
 register_migraphx_gpu_ops(hip_
    argmax
    argmin
-    gather
    logsoftmax
    loop
    multinomial
    nonzero
-    pad
    prefix_scan_sum
    reverse
    scatter

--- a/src/targets/gpu/device/gather.cpp
+++ b/src/targets/gpu/device/gather.cpp
-/*
- * The MIT License (MIT)
- *
- * Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#include <migraphx/shape.hpp>
-#include <migraphx/argument.hpp>
-#include <migraphx/gpu/device/gather.hpp>
-#include <migraphx/gpu/device/tensor.hpp>
-#include <migraphx/gpu/device/launch.hpp>
-#include <migraphx/gpu/device/types.hpp>
-namespace migraphx {
-inline namespace MIGRAPHX_INLINE_NS {
-namespace gpu {
-namespace device {
-argument gather(hipStream_t stream, argument result, argument arg1, argument arg2, int64_t axis)
-{
-    const auto& input_shape = arg1.get_shape();
-    auto lens               = input_shape.lens();
-    auto axis_dim_size      = lens[axis];
-    lens[axis]              = arg2.get_shape().elements();
-    shape out_comp_shape{result.get_shape().type(), lens};
-    std::size_t nelements = result.get_shape().elements();
-    visit_all(result, arg1)([&](auto output, auto input_v) {
-        hip_visit_views(input_v, out_comp_shape)([&](auto input, auto out_comp) {
-            arg2.visit([&](auto indices) {
-                const auto* indices_ptr = device_cast(indices.data());
-                auto* output_ptr        = device_cast(output.data());
-                gs_launch(stream, nelements, 256)([=](auto i) __device__ {
-                    auto idx      = out_comp.multi(i);
-                    auto in_index = indices_ptr[idx[axis]];
-                    in_index      = (in_index < 0) ? in_index + axis_dim_size : in_index;
-                    idx[axis]     = in_index;
-                    output_ptr[i] = input[idx];
-                });
-            });
-        });
-    });
-    return result;
-}
-} // namespace device
-} // namespace gpu
-} // namespace MIGRAPHX_INLINE_NS
-} // namespace migraphx
--- a/src/targets/gpu/device/pad.cpp
+++ b/src/targets/gpu/device/pad.cpp
-/*
- * The MIT License (MIT)
- *
- * Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#include <migraphx/shape.hpp>
-#include <migraphx/argument.hpp>
-#include <migraphx/clamp.hpp>
-#include <migraphx/gpu/device/nary.hpp>
-#include <migraphx/gpu/device/pad.hpp>
-#include <migraphx/gpu/device/tensor.hpp>
-#include <migraphx/gpu/device/launch.hpp>
-#include <migraphx/float_equal.hpp>
-namespace migraphx {
-inline namespace MIGRAPHX_INLINE_NS {
-namespace gpu {
-namespace device {
-argument
-pad(hipStream_t stream, argument result, argument arg1, float value, std::vector<std::int64_t> pads)
-{
-    std::size_t nelements = arg1.get_shape().elements();
-    hip_visit_all(result, arg1)([&](auto output, auto input) {
-        using type      = typename decltype(output)::value_type;
-        using hip_index = typename decltype(output)::hip_index;
-        type device_val = pad_clamp<host_type<type>>(value);
-        gs_launch(stream, result.get_shape().elements())(
-            [=](auto i) __device__ { output.data()[i] = device_val; });
-        hip_index offsets;
-        std::copy(pads.begin(), pads.begin() + offsets.size(), offsets.begin());
-        gs_launch(stream, nelements)([=](auto i) __device__ {
-            auto idx = input.get_shape().multi(i);
-            for(std::size_t j = 0; j < offsets.size(); j++)
-            {
-                idx[j] += offsets[j];
-            }
-            output[idx] = input.data()[i];
-        });
-    });
-    return result;
-}
-} // namespace device
-} // namespace gpu
-} // namespace MIGRAPHX_INLINE_NS
-} // namespace migraphx
--- a/src/targets/gpu/fuse_mlir.cpp
+++ b/src/targets/gpu/fuse_mlir.cpp
@@ -114,10 +114,7 @@ struct mlir_op
            }
            if(ins->name() == "@return")
            {
-                auto s = ins_shapes[ins->inputs().at(0)].with_type(type);
+                return ins_shapes[ins->inputs().at(0)].with_type(type);
-                if(not s.standard())
-                    MIGRAPHX_THROW("MLIR doesnt support non-standard output");
-                return s;
            }
            std::vector<shape> input_shapes;
            input_shapes.resize(ins->inputs().size());
@@ -139,8 +136,15 @@ get_fusable_input_op_stream(instruction_ref lower_input)
 {
    instruction_ref upper_input = lower_input;
    std::vector<operation> op_stream;
-    while(
+    while(contains({"slice",
-        contains({"slice", "transpose", "contiguous", "reshape", "squeeze", "flatten", "unsqueeze"},
+                    "transpose",
+                    "multibroadcast",
+                    "broadcast",
+                    "contiguous",
+                    "reshape",
+                    "squeeze",
+                    "flatten",
+                    "unsqueeze"},
                   upper_input->name()))
    {
        operation op = upper_input->get_operator();

--- a/src/targets/gpu/gather.cpp
+++ b/src/targets/gpu/gather.cpp
-/*
- * The MIT License (MIT)
- *
- * Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#include <migraphx/gpu/gather.hpp>
-#include <migraphx/gpu/context.hpp>
-#include <migraphx/gpu/device/gather.hpp>
-namespace migraphx {
-inline namespace MIGRAPHX_INLINE_NS {
-namespace gpu {
-shape hip_gather::compute_shape(std::vector<shape> inputs) const
-{
-    inputs.pop_back();
-    return op.normalize_compute_shape(inputs);
-}
-argument hip_gather::compute(context& ctx, const shape&, const std::vector<argument>& args) const
-{
-    return device::gather(ctx.get_stream().get(), args.back(), args[0], args[1], op.axis);
-}
-} // namespace gpu
-} // namespace MIGRAPHX_INLINE_NS
-} // namespace migraphx
--- a/src/targets/gpu/include/migraphx/gpu/device/gather.hpp
+++ b/src/targets/gpu/include/migraphx/gpu/device/gather.hpp
-/*
- * The MIT License (MIT)
- *
- * Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#ifndef MIGRAPHX_GUARD_RTGLIB_DEVICE_GATHER_HPP
-#define MIGRAPHX_GUARD_RTGLIB_DEVICE_GATHER_HPP
-#include <migraphx/argument.hpp>
-#include <migraphx/gpu/device/config.hpp>
-#include <hip/hip_runtime_api.h>
-namespace migraphx {
-inline namespace MIGRAPHX_INLINE_NS {
-namespace gpu {
-namespace device {
-argument MIGRAPHX_DEVICE_EXPORT
-gather(hipStream_t stream, argument result, argument arg1, argument arg2, int64_t axis);
-} // namespace device
-} // namespace gpu
-} // namespace MIGRAPHX_INLINE_NS
-} // namespace migraphx
-#endif
--- a/src/targets/gpu/include/migraphx/gpu/device/pad.hpp
+++ b/src/targets/gpu/include/migraphx/gpu/device/pad.hpp
-/*
- * The MIT License (MIT)
- *
- * Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#ifndef MIGRAPHX_GUARD_RTGLIB_DEVICE_PAD_HPP
-#define MIGRAPHX_GUARD_RTGLIB_DEVICE_PAD_HPP
-#include <migraphx/argument.hpp>
-#include <migraphx/gpu/device/config.hpp>
-#include <hip/hip_runtime_api.h>
-namespace migraphx {
-inline namespace MIGRAPHX_INLINE_NS {
-namespace gpu {
-namespace device {
-argument MIGRAPHX_DEVICE_EXPORT pad(hipStream_t stream,
-                                    argument result,
-                                    argument arg1,
-                                    float value,
-                                    std::vector<std::int64_t> pads);
-} // namespace device
-} // namespace gpu
-} // namespace MIGRAPHX_INLINE_NS
-} // namespace migraphx
-#endif
--- a/src/targets/gpu/include/migraphx/gpu/gather.hpp
+++ b/src/targets/gpu/include/migraphx/gpu/gather.hpp
-/*
- * The MIT License (MIT)
- *
- * Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#ifndef MIGRAPHX_GUARD_RTGLIB_GATHER_HPP
-#define MIGRAPHX_GUARD_RTGLIB_GATHER_HPP
-#include <migraphx/argument.hpp>
-#include <migraphx/reflect.hpp>
-#include <migraphx/op/gather.hpp>
-#include <migraphx/gpu/context.hpp>
-namespace migraphx {
-inline namespace MIGRAPHX_INLINE_NS {
-namespace gpu {
-struct context;
-struct hip_gather
-{
-    op::gather op;
-    template <class Self, class F>
-    static auto reflect(Self& self, F f)
-    {
-        return migraphx::reflect(self.op, f);
-    }
-    std::string name() const { return "gpu::gather"; }
-    shape compute_shape(std::vector<shape> inputs) const;
-    argument
-    compute(context& ctx, const shape& output_shape, const std::vector<argument>& args) const;
-    std::ptrdiff_t output_alias(const std::vector<shape>& shapes) const
-    {
-        return shapes.size() - 1;
-    }
-};
-} // namespace gpu
-} // namespace MIGRAPHX_INLINE_NS
-} // namespace migraphx
-#endif
--- a/src/targets/gpu/include/migraphx/gpu/pad.hpp
+++ b/src/targets/gpu/include/migraphx/gpu/pad.hpp
-/*
- * The MIT License (MIT)
- *
- * Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#ifndef MIGRAPHX_GUARD_RTGLIB_PAD_HPP
-#define MIGRAPHX_GUARD_RTGLIB_PAD_HPP
-#include <migraphx/argument.hpp>
-#include <migraphx/reflect.hpp>
-#include <migraphx/op/pad.hpp>
-namespace migraphx {
-inline namespace MIGRAPHX_INLINE_NS {
-namespace gpu {
-struct context;
-struct hip_pad
-{
-    op::pad op;
-    template <class Self, class F>
-    static auto reflect(Self& self, F f)
-    {
-        return migraphx::reflect(self.op, f);
-    }
-    std::string name() const { return "gpu::pad"; }
-    shape compute_shape(std::vector<shape> inputs) const;
-    argument
-    compute(context& ctx, const shape& output_shape, const std::vector<argument>& args) const;
-    std::ptrdiff_t output_alias(const std::vector<shape>& shapes) const
-    {
-        return shapes.size() - 1;
-    }
-};
-} // namespace gpu
-} // namespace MIGRAPHX_INLINE_NS
-} // namespace migraphx
-#endif