manual merge

4ea39116 · Khalique Ahmed · 20128cae · d8011adf · 4ea39116 · 4ea39116
Commit 4ea39116 authored Nov 10, 2023 by Khalique Ahmed
20 changed files
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
-# Change Log for MIGraphX
+# Changelog for MIGraphX

-Full documentation for MIGraphX is available at [MIGraphX Documentation](https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/).
+Full documentation for MIGraphX is available at
+[https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/](https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/).
+
+## MIGraphX 2.8 for ROCm 6.0.0
+
+### Additions
+
+* Support for MI300 GPUs
+* Support for TorchMIGraphX via PyTorch
+* Boosted overall performance by integrating rocMLIR
+* INT8 support for ONNX Runtime
+* Support for ONNX version 1.14.1
+* Added new operators: `Qlinearadd`, `QlinearGlobalAveragePool`, `Qlinearconv`, `Shrink`, `CastLike`,
+  and `RandomUniform`
+* Added an error message for when `gpu_targets` is not set during MIGraphX compilation
+* Added parameter to set tolerances with `migraphx-driver` verify
+* Added support for MXR files > 4 GB
+* Added `MIGRAPHX_TRACE_MLIR` flag
+* BETA added capability for using ROCm Composable Kernels via the `MIGRAPHX_ENABLE_CK=1`
+  environment variable
+
+### Optimizations
+
+* Improved performance support for INT8
+* Improved time precision while benchmarking candidate kernels from CK or MLIR
+* Removed contiguous from reshape parsing
+* Updated the `ConstantOfShape` operator to support Dynamic Batch
+* Simplified dynamic shapes-related operators to their static versions, where possible
+* Improved debugging tools for accuracy issues
+* Included a print warning about `miopen_fusion` while generating `mxr`
+* General reduction in system memory usage during model compilation
+* Created additional fusion opportunities during model compilation
+* Improved debugging for matchers
+* Improved general debug messages
+
+### Fixes
+
+* Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo
+* Provided a compile option to improve the accuracy of some models by disabling Fast-Math
+* Improved layernorm + pointwise fusion matching to ignore argument order
+* Fixed accuracy issue with `ROIAlign` operator
+* Fixed computation logic for the `Trilu` operator
+* Fixed support for the DETR model
+
+### Changes
+
+* Changed MIGraphX version to 2.8
+* Extracted the test packages into a separate deb file when building MIGraphX from source
+
+### Removals
+
+* Removed building Python 2.7 bindings

 ## MIGraphX 2.7 for ROCm 5.7.0
-### Added
- Enabled hipRTC to not require dev packages for migraphx runtime and allow the ROCm install to be in a different directory than it was during build time
- Add support for multi-target execution
- Added Dynamic Batch support with C++/Python APIs
- Add migraphx.create_argument to python API
- Added dockerfile example for Ubuntu 22.04
- Add TensorFlow supported ops in driver similar to exist onnx operator list
- Add a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace
- Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2
- use fast_math flag instead of ENV flag for GELU
- Print message from driver if offload copy is set for compiled program
+
+### Additions
+
+* hipRTC no longer requires dev packages for MIGraphX runtime and allows the ROCm install to be in a
+   different directory than build time
+* Added support for multi-target execution
+* Added Dynamic Batch support with C++/Python APIs
+* Added `migraphx.create_argument` to Python API
+* Added dockerfile example for Ubuntu 22.04
+* Added TensorFlow supported ops in driver similar to exist onnx operator list
+* Added a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace
+* Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2
+* You can now use the ` fast_math` flag instead of `ENV` for GELU
+* Print message from driver if offload copy is set for compiled program
+
 ### Optimizations
- Optimized for ONNX Runtime 1.14.0
- Improved compile times by only building for the GPU on the system
- Improve performance of pointwise/reduction kernels when using NHWC layouts
- Load specific version of the migraphx_py library
- Annotate functions with the block size so the compiler can do a better job of optimizing 
- Enable reshape on nonstandard shapes
- Use half HIP APIs to compute max and min
- Added support for broadcasted scalars to unsqueeze operator
- Improved multiplies with dot operator
- Handle broadcasts across dot and concat
- Add verify namespace for better symbol resolution
-### Fixed
- Resolved accuracy issues with FP16 resnet50
- Update cpp generator to handle inf from  float
- Fix assertion error during verify and make DCE work with tuples
- Fix convert operation for NaNs
- Fix shape typo in API test
- Fix compile warnings for shadowing variable names
- Add missing specialization for the `nullptr` for the hash function
-### Changed
- Bumped version of half library to 5.6.0
- Bumped CI to support rocm 5.6
- Make building tests optional
- replace np.bool with bool as per numpy request
-### Removed
- Removed int8x4 rocBlas calls due to deprecation
- removed std::reduce usage since not all OS' support it

+* Optimized for ONNX Runtime 1.14.0
+* Improved compile times by only building for the GPU on the system
+* Improved performance of pointwise/reduction kernels when using NHWC layouts
+* Loaded specific version of the `migraphx_py` library
+* Annotated functions with the block size so the compiler can do a better job of optimizing
+* Enabled reshape on nonstandard shapes
+* Used half HIP APIs to compute max and min
+* Added support for broadcasted scalars to unsqueeze operator
+* Improved multiplies with dot operator
+* Handled broadcasts across dot and concat
+* Added verify namespace for better symbol resolution
+
+### Fixes
+
+* Resolved accuracy issues with FP16 resnet50
+* Updated cpp generator to handle inf from float
+* Fixed assertion error during verify and made DCE work with tuples
+* Fixed convert operation for NaNs
+* Fixed shape typo in API test
+* Fixed compile warnings for shadowing variable names
+* Added missing specialization for the `nullptr` hash function
+
+### Changees
+
+* Bumped version of half library to 5.6.0
+* Bumped CI to support ROCm 5.6
+* Made building tests optional
+* Replaced `np.bool` with `bool` per NumPy request
+
+### Removals
+
+* Removed int8x4 rocBlas calls due to deprecation
+* Removed `std::reduce` usage because not all operating systems support it

 ## MIGraphX 2.5 for ROCm 5.5.0
-### Added
- Y-Model feature to store tuning information with the optimized model
- Added Python 3.10 bindings 
- Accuracy checker tool based on ONNX Runtime
- ONNX Operators parse_split, and Trilu 
- Build support for ROCm MLIR
- Added migraphx-driver flag to print optimizations in python (--python)
- Added JIT implementation of the Gather and Pad operator which results in better handling of larger tensor sizes.
+
+### Additions
+
+* Y-Model feature will store tuning information with the optimized model
+* Added Python 3.10 bindings
+* Accuracy checker tool based on ONNX runtime
+* ONNX operators parse_split, and Trilu
+* Build support for ROCm MLIR
+* Added the `migraphx-driver` flag to print optimizations in Python (--python)
+* Added JIT implementation of the Gather and Pad operators, which results in better handling for
+  larger tensor sizes
+
 ### Optimizations
- Improved performance of Transformer based models
- Improved performance of the Pad, Concat, Gather, and Pointwise operators
- Improved onnx/pb file loading speed
- Added general optimize pass which runs several passes such as simplify_reshapes/algebra and DCE in loop.
-### Fixed
- Improved parsing Tensorflow Protobuf files 
- Resolved various accuracy issues with some onnx models
- Resolved a gcc-12 issue with mivisionx
- Improved support for larger sized models and batches
- Use --offload-arch instead of --cuda-gpu-arch for the HIP compiler
- Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow.
- Changes inside JIT to temporarily use cosine to compute sine function.
-### Changed
- Changed version/location of 3rd party build dependencies to pick up fixes
+
+* Improved performance of Transformer-based models
+* Improved performance of the `Pad`, `Concat`, `Gather`, and `Pointwise` operators
+* Improved ONNX/pb file loading speed
+* Added a general optimize pass that runs several passes, such as `simplify_reshapes`, algebra, and DCE
+  in a loop
+
+### Fixes
+
+* Improved parsing for TensorFlow Protobuf files
+* Resolved various accuracy issues with some ONNX models
+* Resolved a gcc-12 issue with MIVisionX
+* Improved support for larger sized models and batches
+* Use `--offload-arch` instead of `--cuda-gpu-arch` for the HIP compiler
+* Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow
+* Changes inside JIT to temporarily use cosine to compute sine function
+
+### Changes
+
+* Changed version and location of third-party build dependencies in order to pick up fixes
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -27,20 +27,18 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_BINARY_DIR}")
    message(FATAL_ERROR "The binary and source directroy cannot be the same")
 endif()

-get_property(_GENERATOR_IS_MULTI_CONFIG GLOBAL PROPERTY GENERATOR_IS_MULTI_CONFIG)
+# Setup valid strings for build type
+if (NOT CMAKE_CONFIGURATION_TYPES)
+    set(CMAKE_CONFIGURATION_TYPES "Debug;Release;RelWithDebInfo;MinSizeRel" CACHE STRING "Configs")
+endif()

+get_property(MIGRAPHX_GENERATOR_IS_MULTI_CONFIG GLOBAL PROPERTY GENERATOR_IS_MULTI_CONFIG)
 # This has to be initialized before the project() command appears
 # Set the default of CMAKE_BUILD_TYPE to be release, unless user specifies with -D.  MSVC_IDE does not use CMAKE_BUILD_TYPE
-if(_GENERATOR_IS_MULTI_CONFIG)
-    if (NOT CMAKE_CONFIGURATION_TYPES)
-        set(CMAKE_CONFIGURATION_TYPES "Debug;Release;RelWithDebInfo;MinSizeRel" CACHE STRING
-            "Available build types (configurations) on multi-config generators")
-    endif()
-else()
-    if(NOT CMAKE_BUILD_TYPE)
-        set(CMAKE_BUILD_TYPE Release CACHE STRING
-            "Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel.")
-    endif()
+if(NOT MIGRAPHX_GENERATOR_IS_MULTI_CONFIG)
+    set(CMAKE_BUILD_TYPE Release CACHE STRING
+        "Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel.")
+    set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS ${CMAKE_CONFIGURATION_TYPES})
 endif()

 set(CMAKE_INSTALL_PREFIX "/opt/rocm" CACHE PATH "")
@@ -53,6 +51,18 @@ include(CTest)
 find_package(ROCM REQUIRED)
 find_package(Threads REQUIRED)

+if(WIN32)
+option(MIGRAPHX_ENABLE_PYTHON "Enable python bindings" OFF)
+else()
+option(MIGRAPHX_ENABLE_PYTHON "Enable python bindings" ON)
+endif()
+
+if(WIN32) # CK is not yet ported to Windows
+option(MIGRAPHX_USE_COMPOSABLEKERNEL "Enable MIGraphX to use composable kernel JIT library" OFF)
+else()
+option(MIGRAPHX_USE_COMPOSABLEKERNEL "Enable MIGraphX to use composable kernel JIT library" ON)
+endif()
+
 find_path(HALF_INCLUDE_DIR half.hpp PATH_SUFFIXES half)
 if (NOT HALF_INCLUDE_DIR)
    message(FATAL_ERROR "Could not find half.hpp - Please check that the install path of half.hpp has been added to CMAKE_PREFIX_PATH")
@@ -71,8 +81,9 @@ include(ROCMSetupVersion)

 option(BUILD_DEV "Build for development purpose only" OFF)

-rocm_setup_version(VERSION 2.8.0)
-set(MIGRAPHX_SO_VERSION ${PROJECT_VERSION_MAJOR}.${PROJECT_VERSION_MINOR}.${PROJECT_VERSION_PATCH})
+rocm_setup_version(VERSION 2.9.0)
+math(EXPR MIGRAPHX_SO_MAJOR_VERSION "(${PROJECT_VERSION_MAJOR} * 1000 * 1000) + (${PROJECT_VERSION_MINOR} * 1000) + ${PROJECT_VERSION_PATCH}")
+set(MIGRAPHX_SO_VERSION ${MIGRAPHX_SO_MAJOR_VERSION}.0)

 option( BUILD_SHARED_LIBS "Build as a shared library" ON )

@@ -261,8 +272,6 @@ rocm_enable_cppcheck(
        MIGRAPHX_USE_CLANG_TIDY
 )

-enable_testing()
-
 include(ROCMCreatePackage)
 include(ROCMTest)


--- a/Dockerfile
+++ b/Dockerfile
@@ -80,6 +80,10 @@ ADD rbuild.ini /rbuild.ini
 # Temporarily install a new cmake until switching to ubuntu 22.04
 RUN pip3 install cmake==3.22.1

+# Location where onnx unit tests models are cached
+ENV ONNX_HOME=/.onnx
+RUN mkdir -p $ONNX_HOME/models && chmod 777 $ONNX_HOME/models
+
 COPY ./tools/install_prereqs.sh /
 RUN /install_prereqs.sh /usr/local / && rm /install_prereqs.sh
 RUN test -f /usr/local/hash || exit 1
@@ -91,11 +95,6 @@ RUN pip3 install yapf==0.28.0
 ADD docs/.sphinx/requirements.txt /doc-requirements.txt
 RUN pip3 install -r /doc-requirements.txt

-# Download real models to run onnx unit tests
-ENV ONNX_HOME=/.onnx
-COPY ./tools/download_models.sh /
-RUN /download_models.sh && rm /download_models.sh
-
 # Install latest ccache version
 RUN cget -p $PREFIX install facebook/zstd@v1.4.5 -X subdir -DCMAKE_DIR=build/cmake
 RUN cget -p $PREFIX install ccache@v4.1 -DENABLE_TESTING=OFF

--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -30,7 +30,7 @@ def rocmtestnode(Map conf) {
            rm -rf build
            mkdir build
            cd build
-            cmake -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DBUILD_DEV=On -DCMAKE_EXECUTE_PROCESS_COMMAND_ECHO=STDOUT ${flags} ..
+            cmake -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DBUILD_DEV=On -DCMAKE_EXECUTE_PROCESS_COMMAND_ECHO=STDOUT -DMIGRAPHX_DISABLE_VIRTUAL_ENV=ON ${flags} ..
            git diff
            git diff-index --quiet HEAD || (echo "Git repo is not clean after running cmake." && exit 1)
            make -j\$(nproc) generate VERBOSE=1
@@ -107,12 +107,15 @@ def rocmnode(name, body) {
    }
 }

-rocmtest clang_debug: rocmnode('cdna') { cmake_build ->
+rocmtest clang_debug: rocmnode('mi100+') { cmake_build ->
    stage('hipRTC Debug') {
-        def sanitizers = "undefined"
-        def debug_flags = "-g -O2 -fsanitize=${sanitizers} -fno-sanitize-recover=${sanitizers}"
-        def gpu_targets = getgputargets()
-        cmake_build(flags: "-DCMAKE_BUILD_TYPE=debug -DMIGRAPHX_ENABLE_PYTHON=Off -DCMAKE_CXX_FLAGS_DEBUG='${debug_flags}' -DCMAKE_C_FLAGS_DEBUG='${debug_flags}' -DMIGRAPHX_USE_HIPRTC=On -DGPU_TARGETS='${gpu_targets}'", gpu_debug: true)
+        // Disable MLIR since it doesnt work with all ub sanitizers
+        withEnv(['MIGRAPHX_DISABLE_MLIR=1']) {
+            def sanitizers = "undefined"
+            def debug_flags = "-g -O2 -fsanitize=${sanitizers} -fno-sanitize-recover=${sanitizers}"
+            def gpu_targets = getgputargets()
+            cmake_build(flags: "-DCMAKE_BUILD_TYPE=debug -DMIGRAPHX_ENABLE_PYTHON=Off -DCMAKE_CXX_FLAGS_DEBUG='${debug_flags}' -DCMAKE_C_FLAGS_DEBUG='${debug_flags}' -DMIGRAPHX_USE_HIPRTC=On -DGPU_TARGETS='${gpu_targets}'", gpu_debug: true)
+        }
    }
 }, clang_release: rocmnode('mi100+') { cmake_build ->
    stage('Hip Clang Release') {
@@ -124,14 +127,14 @@ rocmtest clang_debug: rocmnode('cdna') { cmake_build ->
 //     stage('Hidden symbols') {
 //         cmake_build(flags: "-DMIGRAPHX_ENABLE_PYTHON=Off -DMIGRAPHX_ENABLE_GPU=On -DMIGRAPHX_ENABLE_CPU=On -DCMAKE_CXX_VISIBILITY_PRESET=hidden -DCMAKE_C_VISIBILITY_PRESET=hidden")
 //     }
-}, all_targets_debug : rocmnode('cdna') { cmake_build ->
+}, all_targets_debug : rocmnode('mi100+') { cmake_build ->
    stage('All targets Release') {
        def gpu_targets = getgputargets()
        cmake_build(flags: "-DCMAKE_BUILD_TYPE=release -DMIGRAPHX_ENABLE_GPU=On -DMIGRAPHX_ENABLE_CPU=On -DMIGRAPHX_ENABLE_FPGA=On -DGPU_TARGETS='${gpu_targets}'")
    }
-}, mlir_debug: rocmnode('cdna') { cmake_build ->
+}, mlir_debug: rocmnode('mi100+') { cmake_build ->
    stage('MLIR Debug') {
-        withEnv(['MIGRAPHX_ENABLE_MLIR=1']) {
+        withEnv(['MIGRAPHX_ENABLE_EXTRA_MLIR=1']) {
            def sanitizers = "undefined"
            // Note: the -fno-sanitize= is copied from upstream LLVM_UBSAN_FLAGS.
            def debug_flags_cxx = "-g -O2 -fsanitize=${sanitizers} -fno-sanitize=vptr,function -fno-sanitize-recover=${sanitizers}"
@@ -142,7 +145,7 @@ rocmtest clang_debug: rocmnode('cdna') { cmake_build ->
    }
 }, ck_hiprtc: rocmnode('mi100+') { cmake_build ->
    stage('CK hipRTC') {
-        withEnv(['MIGRAPHX_ENABLE_CK=1', 'MIGRAPHX_TUNE_CK=1']) {
+        withEnv(['MIGRAPHX_ENABLE_CK=1', 'MIGRAPHX_TUNE_CK=1', 'MIGRAPHX_DISABLE_MLIR=1']) {
            def gpu_targets = getgputargets()
            cmake_build(flags: "-DCMAKE_BUILD_TYPE=release -DMIGRAPHX_USE_HIPRTC=On -DGPU_TARGETS='${gpu_targets}'")
        }

--- a/README.md
+++ b/README.md
 # AMD MIGraphX

-AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference. AMD MIGraphX can be used by
-installing binaries directly or building from source code.
+AMD MIGraphX is AMD's graph inference engine, which accelerates machine learning model inference.
+To use MIGraphX, you can install the binaries or build from source code. Refer to the following sections
+for Ubuntu installation instructions (we'll provide instructions for other Linux distributions in the future).

-In the following, instructions of how to build and install MIGraphX are described with Ubuntu as the OS
-(Instructions of installation on other Linux OSes will come later). Note that all the following instructions assume 
-ROCm has been installed successfully. ROCm installation instructions are explained in the [ROCm installation
-guide](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html).
+```note
+You must [install ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html) before
+installing MIGraphX.
+```

 ## Installing from binaries
-With ROCm installed correctly, MIGraphX binaries can be installed on Ubuntu with the following command:
-```
+
+Install binaries using:
+
+```bash
 sudo apt update && sudo apt install -y migraphx
 ```
-then the header files and libs are installed under `/opt/rocm-<version>`, where `<version>` is the ROCm version.
+
+Header files and libraries are installed under `/opt/rocm-<version>`, where `<version>` is the ROCm
+version.

 ## Building from source

-There are three ways to build the MIGraphX sources. 
-* [Use the ROCm build tool](#use-the-rocm-build-tool-rbuild)
-    
-    This approach uses [rbuild](https://github.com/RadeonOpenCompute/rbuild) to install the prerequisites and
-build the libs with just one command. 
+You have three options for building from source:

-* [Use cmake](#use-cmake-to-build-migraphx)
-    
-    This approach uses a script to install the prerequisites, then use cmake to build the source.
-      
-* [Use docker](#use-docker)
-    
-    This approach builds a docker image with all prerequisites installed, then build the MIGraphX sources inside a docker container. 
+* [ROCm build tool](#use-the-rocm-build-tool-rbuild): Uses
+  [rbuild](https://github.com/RadeonOpenCompute/rbuild) to install prerequisites, then you can build
+  the libraries with a single command.

-In the following, we will first list the prerequisites required to build MIGraphX source code, then describe 
-each of the three approaches.
+* [CMake](#use-cmake-to-build-migraphx): Uses a script to install prerequisites, then you can use
+  CMake to build the source.

-### List of prerequisites
-The following is a list of prerequisites required to build MIGraphX source. 
+* [Docker](#use-docker): Builds a Docker image with all prerequisites installed, then you can build the
+  MIGraphX sources inside a Docker container.

-* [ROCm cmake modules](https://github.com/RadeonOpenCompute/rocm-cmake) **required**
+### Build prerequisites
+
+The following is a list of prerequisites for building MIGraphX.
+
+* [ROCm CMake modules](https://github.com/RadeonOpenCompute/rocm-cmake) **required**
 * [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen) for running on the GPU
 * [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS) for running on the GPU
 * [HIP](https://github.com/ROCm-Developer-Tools/HIP) for running on the GPU
-* [Protobuf](https://github.com/google/protobuf) for reading [onnx](https://github.com/onnx/onnx) files
-* [Half](http://half.sourceforge.net/) - IEEE 754-based half-precision floating point library
-* [pybind11](https://pybind11.readthedocs.io/en/stable/) - for python bindings
-* [JSON](https://github.com/nlohmann/json) - for model serialization to json string format
-* [MessagePack](https://msgpack.org/index.html) - for model serialization to binary format
-* [SQLite3](https://www.sqlite.org/index.html) - to create database of kernels' tuning information or execute queries on existing database
+* [Protobuf](https://github.com/google/protobuf) for reading [onnx](https://github.com/onnx/onnx)
+  files
+* [Half](http://half.sourceforge.net/), an IEEE 754-based half-precision floating point library
+* [pybind11](https://pybind11.readthedocs.io/en/stable/) for python bindings
+* [JSON](https://github.com/nlohmann/json) for model serialization to json string format
+* [MessagePack](https://msgpack.org/index.html) for model serialization to binary format
+* [SQLite3](https://www.sqlite.org/index.html) to create database of kernels' tuning information or run queries on existing database

-#### Use the ROCm build tool [rbuild](https://github.com/RadeonOpenCompute/rbuild).
+### Use the ROCm build tool [rbuild](https://github.com/RadeonOpenCompute/rbuild).

-In this approach, we use the [rbuild](https://github.com/RadeonOpenCompute/rbuild) build tool to
-build MIGraphX. The specific steps are as follows:
+1. Install `rocm-cmake`, `pip3`, `rocblas`, and `miopen-hip`:

-1) Install rocm-cmake, pip3, rocblas, and miopen-hip with the command
+    ```bash
+    sudo apt install -y rocm-cmake python3-pip rocblas miopen-hip
+    ```

-```
-sudo apt install -y rocm-cmake python3-pip rocblas miopen-hip
-```
+2. Install [rbuild](https://github.com/RadeonOpenCompute/rbuild) (sudo may be required):

-2) Install [rbuild](https://github.com/RadeonOpenCompute/rbuild) (sudo may be required here.)
+    ```bash
+    pip3 install https://github.com/RadeonOpenCompute/rbuild/archive/master.tar.gz
+    ```

-```
-pip3 install https://github.com/RadeonOpenCompute/rbuild/archive/master.tar.gz
-```
+3. Build MIGraphX source code:

-3) Build MIGraphX source code
+    ```bash
+    rbuild build -d depend -B build -DGPU_TARGETS=$(/opt/rocm/bin/rocminfo | grep -o -m1 'gfx.*')
+    ```

-```
-rbuild build -d depend -B build
+Once completed, all prerequisites are in the `depend` folder and MIGraphX is in the `build` directory.
+
+```note
+If you get an `rbuild: command not found` error, it's because `rbuild` is installed in `$HOME/.local/bin`,
+which is not in `PATH`. You can either export PATH as `export PATH=$HOME/.local/bin:$PATH` to add
+the folder to `PATH`, or add the option `--prefix /usr/local` in the pip3 command when installing `rbuild`.
 ```

-then all the prerequisites are in the folder `depend`, and MIGraphX is built in the `build` directory.
+### Use CMake to build MIGraphX

-Also note that you may meet the error of `rbuild: command not found`. It is because rbuild is installed 
-at `$HOME/.local/bin`, which is not in `PATH`. You can either export PATH as `export PATH=$HOME/.local/bin:$PATH` 
-to add the folder to `PATH` or add the option `--prefix /usr/local` in the pip3 command when installing rbuild.
+1. Install the prerequisites:

-#### Use cmake to build MIGraphX
+    ```bash
+    rbuild prepare -d depend
+    ```

-If using this approach, we need to install the prerequisites, configure the cmake, and then build the source.
+    This puts all the prerequisites are in `depend` the folder. They can be used in the `cmake`
+    configuration as `-DCMAKE_PREFIX_PATH=depend`.

-##### Installing the prerequisites
+    If you have sudo access, as an alternative to the `rbuild` command, you can install the prerequisites
+    in the same way as a Dockerfile, by calling `./tools/install_prereqs.sh`.

-For convenience, the prerequisites can be built automatically with rbuild as:
+    By default, all prerequisites are installed at the default location (`/usr/local`) and are accessible by all
+    users. For the default location, `sudo` is required to run the script. You can also specify a different
+    location using `./tools/install_prereqs.sh $custom_location`.

-```
-rbuild prepare -d depend
-```
+2. Go to the project folder and create a `build` directory:

-then all the prerequisites are in the folder `depend`, and they can be used in the `cmake` configuration
-as `-DCMAKE_PREFIX_PATH=depend`.
+    ```bash
+    mkdir build
+    cd build
+    ```

-If you have sudo access, as an alternative to the rbuild command, you can install the prerequisites just 
-like in the dockerfile by calling `./tools/install_prereqs.sh`.
+3. Configure CMake. If the prerequisites are installed at the default location `/usr/local`, use:

-(Note that this script is for Ubuntu. By default, all prerequisites are installed at the default location `/usr/local` 
-and are accessible by all users. For the default location, `sudo` is required to run the script.
-You can also specify a location at which the prerequisites are installed with `./tools/install_prereqs.sh $your_loc`.)
+    ```bash
+    CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DGPU_TARGETS=$(/opt/rocm/bin/rocminfo | grep -o -m1 'gfx.*')
+    ```

-##### Building MIGraphX source and install libs
+    Otherwise, you need to set `-DCMAKE_PREFIX_PATH=$your_loc` to configure CMake.

-With the above prerequisites installed, we can build source as:
+4. Build MIGraphX source code:

-1) Go to the project folder and create a `build` directory:
+    ```cpp
+    make -j$(nproc)
+    ```

+    You can verify this using:

-```
-mkdir build
-cd build
-```
+    ```cpp
+    make -j$(nproc) check
+    ```

-2) Configure the cmake. If the prerequisites are installed at the default location `/usr/local`, the command is:
+5. Install MIGraphX libraries:

-```
-CXX=/opt/rocm/llvm/bin/clang++ cmake ..
-```
-Otherwise, you need to set `-DCMAKE_PREFIX_PATH=$your_loc` to configure the cmake. 
+    ```cpp
+    make install
+    ```

-3) Build MIGraphX source code
+### Use Docker

-```
-make -j$(nproc)
-```
+The easiest way to set up the development environment is to use Docker.

-Correctness can be verified as:
+1. With the Dockerfile, build a Docker image:

-```
-make -j$(nproc) check
-```
+    ```bash
+        docker build -t migraphx .
+    ```

-MIGraphX libs can be installed as:
+2. Enter the development environment using `docker run`:

-```
-make install
-```
+    ```bash
+        docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/code/AMDMIGraphX -w /code/AMDMIGraphX --group-add video -it migraphx
+    ```

-#### Use docker
+3. In the Docker container, all required prerequisites are already installed, so you can go to the folder
+    `/code/AMDMIGraphX` and follow the steps (starting from 2) in the
+    [Use CMake to build MIGraphX](#use-cmake-to-build-migraphx).

-The easiest way to setup the development environment is to use docker. With the dockerfile, you can build a docker image as:
+## Using the MIGraphX Python module

-    docker build -t migraphx .
+To use MIGraphX's Python module, you can set `PYTHONPATH` or use the `.deb` package:

-Then to enter the developement environment use `docker run`:
+* Setting `PYTHONPATH`:

-    docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/code/AMDMIGraphX -w /code/AMDMIGraphX --group-add video -it migraphx
+    ```bash
+    export PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
+    ```

-In the docker container, all the required prerequisites are already installed, so users can just go to the folder 
-`/code/AMDMIGraphX` and follow the steps in the above [Build MIGraphX source and install
-libs](#building-migraphx-source-and-install-libs)
-section to build MIGraphX source.
+* Creating the `deb` package:

-### Using MIGraphX Python Module
-To use MIGraphX's Python module, please either set `PYTHONPATH` or use `.deb` package as explained below:
+    ```bash
+    make package
+    ```

- Setting `PYTHONPATH` :
-```
-export PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
-```
- Creating and installing the package:
+    This provides the path for .deb package.

-To create deb package:
-```
-make package
-```
-This will provide the path of .deb package.
+    To install:

-To install:
-```
-dpkg -i <path_to_deb_file>
-```
+    ```bash
+    dpkg -i <path_to_deb_file>
+    ```

-### Calling MIGraphX APIs
-To use MIGraphX's C/C++ API in your cmake project, we need to set `CMAKE_PREFIX_PATH` to the MIGraphX
-installation location and then do 
-```
+## Calling MIGraphX APIs
+
+To use MIGraphX's C/C++ API in your CMake project, you must set `CMAKE_PREFIX_PATH` to the
+MIGraphX installation location and run:
+
+```bash
 find_package(migraphx)
 target_link_libraries(myApp migraphx::c)
 ```
-Where `myApp` is the cmake target in your project.
+
+Where `myApp` is the CMake target in your project.

 ## Building for development

-Using rbuild, the dependencies for development can be installed with:
+Using `rbuild`, you can install the dependencies for development with:

-```
+```bash
 rbuild develop
 ```

-This will install the dependencies for development into the `deps` directory and
-configure `cmake` to use those dependencies in the `build` directory. These
-directories can be changed by passing the `--deps-dir` and `--build-dir` flags
-to `rbuild` command:
+This installs development dependencies in the `deps` directory and configures `cmake` to use those
+dependencies in the `build` directory. You can change these directories by passing the `--deps-dir` and
+`--build-dir` flags to the `rbuild` command:

-```
+```bash
 rbuild develop --build-dir build_rocm_55 --deps-dir /home/user/deps_dir
 ```

@@ -223,12 +227,12 @@ Depending on your setup `sudo` may be required for the pip install.

 All the code is formatted using clang-format. To format a file, use:

-```
+```clang
 clang-format-10 -style=file -i <path-to-source-file>
 ```

 Also, githooks can be installed to format the code per-commit:

-```
+```bash
 ./.githooks/install
 ```
--- a/cmake/Embed.cmake
+++ b/cmake/Embed.cmake
@@ -77,16 +77,17 @@ function(generate_embed_source EMBED_NAME)
        list(GET PARSE_FILES ${idx} FILE)

        set(START_SYMBOL "_binary_${SYMBOL}_start")
-        set(END_SYMBOL "_binary_${SYMBOL}_end")
+        set(LENGTH_SYMBOL "_binary_${SYMBOL}_length")
        if(EMBED_USE_LD)
            string(APPEND EXTERNS "
-                extern const char ${START_SYMBOL}[];
-                extern const char ${END_SYMBOL}[];
+extern const char ${START_SYMBOL}[];
+extern const size_t _binary_${SYMBOL}_size;
+const auto ${LENGTH_SYMBOL} = reinterpret_cast<size_t>(&_binary_${SYMBOL}_size);
            ")
        else()
            string(APPEND EXTERNS "
-                extern const char ${START_SYMBOL}[];
-                extern const char* ${END_SYMBOL};
+extern const char ${START_SYMBOL}[];
+extern const size_t ${LENGTH_SYMBOL};
            ")
        endif()

@@ -97,23 +98,22 @@ function(generate_embed_source EMBED_NAME)
        endif()

        string(APPEND INIT_KERNELS "
-            { \"${BASE_NAME}\", { ${START_SYMBOL}, ${END_SYMBOL}} },
-        ")
+            { \"${BASE_NAME}\", { ${START_SYMBOL}, ${LENGTH_SYMBOL}} },")
    endforeach()

    file(WRITE "${PARSE_HEADER}" "
+#include <string_view>
 #include <unordered_map>
-#include <string>
 #include <utility>
-const std::unordered_map<std::string, std::pair<const char*,const char*>>& ${EMBED_NAME}();
+std::unordered_map<std::string_view, std::string_view> ${EMBED_NAME}();
 ")

    file(WRITE "${PARSE_SRC}" "
 #include <${EMBED_NAME}.hpp>
 ${EXTERNS}
-const std::unordered_map<std::string, std::pair<const char*,const char*>>& ${EMBED_NAME}()
+std::unordered_map<std::string_view, std::string_view> ${EMBED_NAME}()
 {
-    static const std::unordered_map<std::string, std::pair<const char*,const char*>> result = {${INIT_KERNELS}};
+    static std::unordered_map<std::string_view, std::string_view> result = {${INIT_KERNELS}};
    return result;
 }
 ")
@@ -154,9 +154,10 @@ function(embed_file OUTPUT_FILE OUTPUT_SYMBOL FILE)
            # removes trailing comma
            string(REGEX REPLACE ", $" "" ARRAY_VALUES ${ARRAY_VALUES})
            file(WRITE "${OUT_FILE}" "
-                extern const char _binary_${SYMBOL}_start[] = { ${ARRAY_VALUES} };
-                extern const char* _binary_${SYMBOL}_end = _binary_${SYMBOL}_start + sizeof(_binary_${SYMBOL}_start);
-            \n")
+#include <cstddef>
+extern const char _binary_${SYMBOL}_start[] = { ${ARRAY_VALUES} };
+extern const size_t _binary_${SYMBOL}_length = sizeof(_binary_${SYMBOL}_start);
+")
        endif()
    endforeach()
 endfunction()

--- a/docs/.sphinx/requirements.txt
+++ b/docs/.sphinx/requirements.txt
@@ -35,7 +35,7 @@ fastjsonschema==2.16.3
    # via rocm-docs-core
 gitdb==4.0.10
    # via gitpython
-gitpython==3.1.32
+gitpython==3.1.37
    # via rocm-docs-core
 idna==3.4
    # via requests
@@ -75,7 +75,9 @@ pygments==2.15.0
    #   pydata-sphinx-theme
    #   sphinx
 pyjwt[crypto]==2.6.0
-    # via pygithub
+    # via
+    #   pygithub
+    #   pyjwt
 pynacl==1.5.0
    # via pygithub
 pyyaml==6.0
@@ -87,7 +89,7 @@ requests==2.28.2
    # via
    #   pygithub
    #   sphinx
-rocm-docs-core==0.24.2
+rocm-docs-core==0.27.0
    # via -r requirements.in
 smmap==5.0.0
    # via gitdb
@@ -130,7 +132,7 @@ sphinxcontrib-serializinghtml==1.1.5
    # via sphinx
 typing-extensions==4.5.0
    # via pydata-sphinx-theme
-urllib3==1.26.15
+urllib3==1.26.18
    # via requests
 wrapt==1.15.0
    # via deprecated
--- a/docs/contributor_guide.rst
+++ b/docs/contributor_guide.rst
 Contributor Guide
-===============
+=================

 .. toctree::
   :maxdepth: 2
   :caption: Contents:

-   dev_intro
+   dev/dev_intro
   dev/data
   dev/operators
   dev/program

--- a/docs/dev_intro.rst
+++ b/docs/dev_intro.rst
-MIGraphX Fundamentals
+Developer Introduction
 ======================

 MIGraphX provides an optimized execution engine for deep learning neural networks.

--- a/docs/driver.rst
+++ b/docs/driver.rst
 MIGraphX Driver
 ===============

+The MIGraphX driver is a tool that allows you to utilize many of the core functions of MIGraphX without having to write your own program. It can read, compile, run, and test the performance of a model with randomized data.
+
 read
 ----

@@ -17,6 +19,7 @@ compile

 Compiles and prints input graph.

+.. include:: ./driver/read.rst
 .. include:: ./driver/compile.rst

 run
@@ -26,6 +29,7 @@ run

 Loads and prints input graph.

+.. include:: ./driver/read.rst
 .. include:: ./driver/compile.rst

 perf
@@ -35,6 +39,7 @@ perf

 Compiles and runs input graph then prints performance report.

+.. include:: ./driver/read.rst
 .. include:: ./driver/compile.rst

 .. option::  --iterations, -n [unsigned int]
@@ -48,6 +53,7 @@ verify

 Runs reference and CPU or GPU implementations and checks outputs for consistency.

+.. include:: ./driver/read.rst
 .. include:: ./driver/compile.rst

 .. option::  --rms-tol [double]
@@ -71,7 +77,7 @@ Verify each instruction
 Reduce program and verify

 roctx
----
+-----

 .. program:: migraphx-driver roctx

@@ -86,4 +92,5 @@ An example command line combined with rocprof for tracing purposes is given belo
 After `rocprof` is run, the output directory will contain trace information for HIP, HCC and ROCTX in seperate `.txt` files.
 To understand the interactions between API calls, it is recommended to utilize `roctx.py` helper script as desribed in :ref:`dev/tools:rocTX` section. 

-.. include:: ./driver/compile.rst
\ No newline at end of file
+.. include:: ./driver/read.rst
+.. include:: ./driver/compile.rst
--- a/docs/driver/compile.rst
+++ b/docs/driver/compile.rst
-.. include:: ./driver/read.rst
-
 .. option::  --fill0 [std::vector<std::string>]

 Fill parameter with 0s

--- a/docs/driver/read.rst
+++ b/docs/driver/read.rst
@@ -46,11 +46,11 @@ Trim instructions from the end (Default: 0)

 Dim of a parameter (format: "@name d1 d2 dn")

-.. options:: --dyn-input-dim [std::vector<std::string>]
+.. option:: --dyn-input-dim [std::vector<std::string>]

 Set dynamic dimensions of a parameter using JSON formatting (format "@name" "dynamic_dimension_json")

-.. options:: --default-dyn-dim
+.. option:: --default-dyn-dim

 Set the default dynamic dimension (format {min:x, max:y, optimals:[o1,o2,...]})


--- a/docs/reference/py.rst
+++ b/docs/reference/py.rst
@@ -95,7 +95,7 @@ shape
    :rtype: bool

 dynamic_dimension
--------
+-----------------

 .. py:class:: dynamic_dimension(min, max, optimals)

@@ -326,7 +326,7 @@ op
 parse_onnx
 ----------

-.. py:function:: parse_onnx(filename, default_dim_value=1, map_input_dims={}, skip_unknown_operators=false, print_program_on_error=false, max_loop_iterations=10)
+.. py:function:: parse_onnx(filename, default_dim_value=1, map_input_dims={}, skip_unknown_operators=false, print_program_on_error=false, max_loop_iterations=10, limit_max_iterations=65535)

    Load and parse an onnx file.

@@ -337,7 +337,8 @@ parse_onnx
    :param list[dynamic_dimension] map_dyn_input_dims: Explicitly specify the dynamic_dimensions of an input.
    :param str skip_unknown_operators: Continue parsing onnx file if an unknown operator is found.
    :param str print_program_on_error: Print program if an error occurs.
-    :param int max_loop_iterations: Maximum iteration number for the loop operator.
+    :param int max_loop_iterations: Maximum iteration number for the loop operator if trip count is not set.
+    :param int limit_max_iterations: Maximum iteration limit for the loop operator.
    :rtype: program

 parse_tf

--- a/examples/migraphx/custom_op_miopen_kernel/custom_op_miopen_kernel.cpp
+++ b/examples/migraphx/custom_op_miopen_kernel/custom_op_miopen_kernel.cpp
@@ -32,7 +32,7 @@
 #define MIGRAPHX_MIOPEN_ASSERT(x) (assert((x) == miopenStatusSuccess))
 #define MIGRAPHX_HIP_ASSERT(x) (assert((x) == hipSuccess))

-inline miopenTensorDescriptor_t make_miopen_tensor(const migraphx::shape& s, bool pack = false)
+inline miopenTensorDescriptor_t make_miopen_tensor(const migraphx::shape& s)
 {
    miopenTensorDescriptor_t t;
    MIGRAPHX_MIOPEN_ASSERT(miopenCreateTensorDescriptor(&t));
@@ -49,23 +49,9 @@ inline miopenTensorDescriptor_t make_miopen_tensor(const migraphx::shape& s, boo
    else if(s.type() == migraphx_shape_int32_type)
        d = miopenInt32;
    else if(s.type() == migraphx_shape_int8_type)
-    {
-        if(pack)
-        {
-            // update the lens and corresponding strides
-            d          = miopenInt8x4;
-            lens[1]    = ((lens[1] + 3) / 4) * 4;
-            strides[0] = strides[1] * lens[1];
-        }
-        else
-        {
-            d = miopenInt8;
-        }
-    }
+        d = miopenInt8;
    else
-    {
        throw("MAKE_TENSOR: unsupported type");
-    }
    miopenSetTensorDescriptor(t, d, s_lens.size(), lens.data(), strides.data());
    return t;
 }

--- a/examples/migraphx/migraphx_driver/README.md
+++ b/examples/migraphx/migraphx_driver/README.md
@@ -149,9 +149,6 @@ gpu::gelu
 gpu::gelu_new
 gpu::gemm
 gpu::greater
-gpu::int8_conv_pack
-gpu::int8_gemm_pack_a
-gpu::int8_gemm_pack_b
 gpu::layernorm
 gpu::leaky_relu
 gpu::less

--- a/requirements.txt
+++ b/requirements.txt
@@ -21,12 +21,12 @@
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 # THE SOFTWARE.
 #####################################################################################
-google/protobuf@v3.11.0 -DCMAKE_POSITION_INDEPENDENT_CODE=On -X subdir -Dprotobuf_BUILD_TESTS=Off
+google/protobuf@v3.19.0 -DCMAKE_POSITION_INDEPENDENT_CODE=On -X subdir -Dprotobuf_BUILD_TESTS=Off
 nlohmann/json@v3.8.0
 live-clones/blaze@v3.8 -X header -DHEADER_DIR=blaze -H sha256:d0ff011f47538285178908ea5f2cab46bb6a8f55b1edb6e03224a82dbc1a3212
 ROCmSoftwarePlatform/half@rocm-5.6.0
 pybind/pybind11@d159a563383d10c821ba7b2a71905d1207db6de4 --build
 msgpack/msgpack-c@cpp-3.3.0 -DMSGPACK_BUILD_TESTS=Off
-sqlite3@3.17 -DCMAKE_POSITION_INDEPENDENT_CODE=On
-ROCmSoftwarePlatform/composable_kernel@a22e479b8e1557961039db2d5c5ff89cff35e86b -DCK_BUILD_JIT_LIB=On -DCMAKE_POSITION_INDEPENDENT_CODE=On
-ROCmSoftwarePlatform/rocMLIR@a48dfb1f163fb0b38369e73e580968b72e85b594 -DBUILD_FAT_LIBROCKCOMPILER=On
+sqlite3@3.43.2 -DCMAKE_POSITION_INDEPENDENT_CODE=On
+ROCmSoftwarePlatform/composable_kernel@70eefcf4f263aa5c25f3c9ff0db8f6f199ef0fb9 -DCK_BUILD_JIT_LIB=On -DCMAKE_POSITION_INDEPENDENT_CODE=On
+ROCmSoftwarePlatform/rocMLIR@13f6c2a69cfe80a575c6b241ec7353d1e953cb12 -DBUILD_FAT_LIBROCKCOMPILER=On
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -155,6 +155,7 @@ register_migraphx_ops(
    identity
    if_op
    im2col
+    isinf
    isnan
    layout
    leaky_relu
@@ -174,6 +175,7 @@ register_migraphx_ops(
    mul
    multibroadcast
    multinomial
+    nearbyint
    neg
    nonmaxsuppression
    nonzero
@@ -204,7 +206,6 @@ register_migraphx_ops(
    rnn_last_hs_output
    rnn_var_sl_last_output
    roialign
-    round
    rsqrt
    run_on_target
    scalar
@@ -261,9 +262,8 @@ find_package(nlohmann_json 3.8.0 REQUIRED)
 target_link_libraries(migraphx PRIVATE nlohmann_json::nlohmann_json)
 migraphx_generate_export_header(migraphx)

-find_package(PkgConfig)
-pkg_check_modules(SQLITE3 REQUIRED IMPORTED_TARGET sqlite3)
-target_link_libraries(migraphx PRIVATE PkgConfig::SQLITE3)
+find_package(SQLite3 REQUIRED)
+target_link_libraries(migraphx PRIVATE SQLite::SQLite3)

 find_package(msgpackc-cxx QUIET)
 if(NOT msgpackc-cxx_FOUND)
@@ -282,7 +282,9 @@ add_subdirectory(driver)
 add_subdirectory(onnx)
 add_subdirectory(tf)

+if(MIGRAPHX_ENABLE_PYTHON)
 add_subdirectory(py)
+endif()
 add_subdirectory(targets/ref)
 target_link_libraries(migraphx_all_targets INTERFACE migraphx_ref)
 if(MIGRAPHX_ENABLE_CPU)

--- a/src/api/CMakeLists.txt
+++ b/src/api/CMakeLists.txt
@@ -32,6 +32,10 @@ migraphx_generate_export_header(migraphx_c DIRECTORY migraphx/api)
 # bumped when binary compatibility is broken. 
 rocm_set_soversion(migraphx_c 3.0)   

+if(BUILD_TESTING)
+    target_compile_definitions(migraphx_c PRIVATE MIGRAPHX_BUILD_TESTING)
+endif()
+
 rocm_clang_tidy_check(migraphx_c)
 target_link_libraries(migraphx_c PRIVATE migraphx migraphx_tf migraphx_onnx)


--- a/src/api/api.cpp
+++ b/src/api/api.cpp
@@ -38,26 +38,32 @@
 #include <migraphx/register_op.hpp>
 #include <migraphx/json.hpp>
 #include <migraphx/convert_to_json.hpp>
+#include <array>
 #include <algorithm>
 #include <cstdarg>
+
 namespace migraphx {

+#ifdef MIGRAPHX_BUILD_TESTING
 static thread_local bool disable_exception_catch = false; // NOLINT

 extern "C" MIGRAPHX_C_EXPORT void migraphx_test_private_disable_exception_catch(bool b)
 {
    disable_exception_catch = b;
 }
+#endif

 template <class F>
 migraphx_status try_(F f, bool output = true) // NOLINT
 {
+#ifdef MIGRAPHX_BUILD_TESTING
    if(disable_exception_catch)
    {
        f();
    }
    else
    {
+#endif
        try
        {
            f();
@@ -81,7 +87,9 @@ migraphx_status try_(F f, bool output = true) // NOLINT
        {
            return migraphx_status_unknown_error;
        }
+#ifdef MIGRAPHX_BUILD_TESTING
    }
+#endif
    return migraphx_status_success;
 }

@@ -156,6 +164,11 @@ void set_default_loop_iterations(onnx_options& options, int64_t value)
    options.max_loop_iterations = value;
 }

+void set_limit_loop_iterations(onnx_options& options, int64_t value)
+{
+    options.limit_max_iterations = value;
+}
+
 void set_nhwc(tf_options& options, bool is_nhwc) { options.is_nhwc = is_nhwc; }

 void set_default_dim_value(tf_options& options, size_t value) { options.batch_size = value; }
@@ -1896,6 +1909,17 @@ migraphx_onnx_options_set_default_loop_iterations(migraphx_onnx_options_t onnx_o
    return api_error_result;
 }

+extern "C" migraphx_status
+migraphx_onnx_options_set_limit_loop_iterations(migraphx_onnx_options_t onnx_options, int64_t value)
+{
+    auto api_error_result = migraphx::try_([&] {
+        if(onnx_options == nullptr)
+            MIGRAPHX_THROW(migraphx_status_bad_param, "Bad parameter onnx_options: Null pointer");
+        migraphx::set_limit_loop_iterations((onnx_options->object), (value));
+    });
+    return api_error_result;
+}
+
 extern "C" migraphx_status migraphx_file_options_destroy(migraphx_file_options_t file_options)
 {
    auto api_error_result = migraphx::try_([&] { destroy((file_options)); });

--- a/src/api/include/migraphx/migraphx.h
+++ b/src/api/include/migraphx/migraphx.h
@@ -26,6 +26,7 @@

 #include <stdlib.h>
 #include <stdbool.h>
+#include <stdint.h>

 #include <migraphx/api/export.h>

@@ -513,6 +514,9 @@ MIGRAPHX_C_EXPORT migraphx_status migraphx_onnx_options_set_default_dyn_dim_valu
 MIGRAPHX_C_EXPORT migraphx_status migraphx_onnx_options_set_default_loop_iterations(
    migraphx_onnx_options_t onnx_options, int64_t value);

+MIGRAPHX_C_EXPORT migraphx_status migraphx_onnx_options_set_limit_loop_iterations(
+    migraphx_onnx_options_t onnx_options, int64_t value);
+
 MIGRAPHX_C_EXPORT migraphx_status
 migraphx_file_options_destroy(migraphx_file_options_t file_options);