- Added an error message when gpu_targets is not set when compiling migraphx
* Support for ONNX version 1.14.1
- Added parameter to set tolerances with migraphx-driver verify
* Added new operators: `Qlinearadd`, `QlinearGlobalAveragePool`, `Qlinearconv`, `Shrink`, `CastLike`,
- Added support for MXR files >4 GB
and `RandomUniform`
- Added MIGRAPHX_TRACE_MLIR flag
* Added an error message for when `gpu_targets` is not set during MIGraphX compilation
- BETA added capability to use ROCm Composable Kernels via environment variable MIGRAPHX_ENABLE_CK=1
* Added parameter to set tolerances with `migraphx-driver` verify
* Added support for MXR files > 4 GB
* Added `MIGRAPHX_TRACE_MLIR` flag
* BETA added capability for using ROCm Composable Kernels via the `MIGRAPHX_ENABLE_CK=1`
environment variable
### Optimizations
### Optimizations
- Improved performance support for INT8
- Improved time percision while benchmarking candidate kernels from CK or MLIR
- Remove contiguous from reshape parsing
- Updated ConstantOfShape operator to support Dynamic Batch
- Simplifies dynamic shapes related operators to their static versions if possible
- Improved debugging tools for accuracy issues
- Print warning about miopen_fusion while generating mxr
- General reduction in system memory usage during model compilation
- Created additional fusion opportunities during model compilation
- Improved debugging for matchers
- Improved general debug messages
### Fixed
- Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo
- Provided a compile option to improve accuracy of some models by disabling Fast-Math
- Improved layernorm + pointwise fusion matching to ignore arguments order
- Fixed accuracy issue with ROIAlign operator
- Fixed Trilu operator computation logic
- Fixed support for the DETR model
### Changed
- Changed migraphx version to 2.8
- Extracted test packages as its own separate deb file when building migraphx from source
### Removed
- Removed building Python 2.7 bindings
* Improved performance support for INT8
* Improved time precision while benchmarking candidate kernels from CK or MLIR
* Removed contiguous from reshape parsing
* Updated the `ConstantOfShape` operator to support Dynamic Batch
* Simplified dynamic shapes-related operators to their static versions, where possible
* Improved debugging tools for accuracy issues
* Included a print warning about `miopen_fusion` while generating `mxr`
* General reduction in system memory usage during model compilation
* Created additional fusion opportunities during model compilation
* Improved debugging for matchers
* Improved general debug messages
### Fixes
* Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo
* Provided a compile option to improve the accuracy of some models by disabling Fast-Math
* Improved layernorm + pointwise fusion matching to ignore argument order
* Fixed accuracy issue with `ROIAlign` operator
* Fixed computation logic for the `Trilu` operator
* Fixed support for the DETR model
### Changes
* Changed MIGraphX version to 2.8
* Extracted the test packages into a separate deb file when building MIGraphX from source
### Removals
* Removed building Python 2.7 bindings
## MIGraphX 2.7 for ROCm 5.7.0
## MIGraphX 2.7 for ROCm 5.7.0
### Added
- Enabled hipRTC to not require dev packages for migraphx runtime and allow the ROCm install to be in a different directory than it was during build time
### Additions
- Add support for multi-target execution
- Added Dynamic Batch support with C++/Python APIs
* hipRTC no longer requires dev packages for MIGraphX runtime and allows the ROCm install to be in a
- Add migraphx.create_argument to python API
different directory than build time
- Added dockerfile example for Ubuntu 22.04
* Added support for multi-target execution
- Add TensorFlow supported ops in driver similar to exist onnx operator list
* Added Dynamic Batch support with C++/Python APIs
- Add a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace
* Added `migraphx.create_argument` to Python API
- Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2
* Added dockerfile example for Ubuntu 22.04
- use fast_math flag instead of ENV flag for GELU
* Added TensorFlow supported ops in driver similar to exist onnx operator list
- Print message from driver if offload copy is set for compiled program
* Added a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace
* Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2
* You can now use the ` fast_math` flag instead of `ENV` for GELU
* Print message from driver if offload copy is set for compiled program
### Optimizations
### Optimizations
- Optimized for ONNX Runtime 1.14.0
- Improved compile times by only building for the GPU on the system
- Improve performance of pointwise/reduction kernels when using NHWC layouts
- Load specific version of the migraphx_py library
- Annotate functions with the block size so the compiler can do a better job of optimizing
- Enable reshape on nonstandard shapes
- Use half HIP APIs to compute max and min
- Added support for broadcasted scalars to unsqueeze operator
- Improved multiplies with dot operator
- Handle broadcasts across dot and concat
- Add verify namespace for better symbol resolution
### Fixed
- Resolved accuracy issues with FP16 resnet50
- Update cpp generator to handle inf from float
- Fix assertion error during verify and make DCE work with tuples
- Fix convert operation for NaNs
- Fix shape typo in API test
- Fix compile warnings for shadowing variable names
- Add missing specialization for the `nullptr` for the hash function
### Changed
- Bumped version of half library to 5.6.0
- Bumped CI to support rocm 5.6
- Make building tests optional
- replace np.bool with bool as per numpy request
### Removed
- Removed int8x4 rocBlas calls due to deprecation
- removed std::reduce usage since not all OS' support it
* Optimized for ONNX Runtime 1.14.0
* Improved compile times by only building for the GPU on the system
* Improved performance of pointwise/reduction kernels when using NHWC layouts
* Loaded specific version of the `migraphx_py` library
* Annotated functions with the block size so the compiler can do a better job of optimizing
* Enabled reshape on nonstandard shapes
* Used half HIP APIs to compute max and min
* Added support for broadcasted scalars to unsqueeze operator
* Improved multiplies with dot operator
* Handled broadcasts across dot and concat
* Added verify namespace for better symbol resolution
### Fixes
* Resolved accuracy issues with FP16 resnet50
* Updated cpp generator to handle inf from float
* Fixed assertion error during verify and made DCE work with tuples
* Fixed convert operation for NaNs
* Fixed shape typo in API test
* Fixed compile warnings for shadowing variable names
* Added missing specialization for the `nullptr` hash function
### Changees
* Bumped version of half library to 5.6.0
* Bumped CI to support ROCm 5.6
* Made building tests optional
* Replaced `np.bool` with `bool` per NumPy request
### Removals
* Removed int8x4 rocBlas calls due to deprecation
* Removed `std::reduce` usage because not all operating systems support it
## MIGraphX 2.5 for ROCm 5.5.0
## MIGraphX 2.5 for ROCm 5.5.0
### Added
- Y-Model feature to store tuning information with the optimized model
### Additions
- Added Python 3.10 bindings
- Accuracy checker tool based on ONNX Runtime
* Y-Model feature will store tuning information with the optimized model
- ONNX Operators parse_split, and Trilu
* Added Python 3.10 bindings
- Build support for ROCm MLIR
* Accuracy checker tool based on ONNX runtime
- Added migraphx-driver flag to print optimizations in python (--python)
* ONNX operators parse_split, and Trilu
- Added JIT implementation of the Gather and Pad operator which results in better handling of larger tensor sizes.
* Build support for ROCm MLIR
* Added the `migraphx-driver` flag to print optimizations in Python (--python)
* Added JIT implementation of the Gather and Pad operators, which results in better handling for
larger tensor sizes
### Optimizations
### Optimizations
- Improved performance of Transformer based models
- Improved performance of the Pad, Concat, Gather, and Pointwise operators
* Improved performance of Transformer-based models
- Improved onnx/pb file loading speed
* Improved performance of the `Pad`, `Concat`, `Gather`, and `Pointwise` operators
- Added general optimize pass which runs several passes such as simplify_reshapes/algebra and DCE in loop.
* Improved ONNX/pb file loading speed
### Fixed
* Added a general optimize pass that runs several passes, such as `simplify_reshapes`, algebra, and DCE
- Improved parsing Tensorflow Protobuf files
in a loop
- Resolved various accuracy issues with some onnx models
- Resolved a gcc-12 issue with mivisionx
### Fixes
- Improved support for larger sized models and batches
- Use --offload-arch instead of --cuda-gpu-arch for the HIP compiler
* Improved parsing for TensorFlow Protobuf files
- Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow.
* Resolved various accuracy issues with some ONNX models
- Changes inside JIT to temporarily use cosine to compute sine function.
* Resolved a gcc-12 issue with MIVisionX
### Changed
* Improved support for larger sized models and batches
- Changed version/location of 3rd party build dependencies to pick up fixes
* Use `--offload-arch` instead of `--cuda-gpu-arch` for the HIP compiler
* Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow
* Changes inside JIT to temporarily use cosine to compute sine function
### Changes
* Changed version and location of third-party build dependencies in order to pick up fixes
MIGraphX provides an optimized execution engine for deep learning neural networks.
MIGraphX provides an optimized execution engine for deep learning neural networks.
...
@@ -18,8 +18,8 @@ Directions for building MIGraphX from source can be found in the main README fil
...
@@ -18,8 +18,8 @@ Directions for building MIGraphX from source can be found in the main README fil
Adding Two Literals
Adding Two Literals
--------------------
--------------------
A program is a collection of modules, which are collections of instructions to be executed when calling `eval <migraphx::program::eval>`.
A program is a collection of modules, which are collections of instructions to be executed when calling :cpp:any:`eval <migraphx::internal::program::eval>`.
Each instruction has an associated `operation <migraphx::operation>` which represents the computation to be performed by the instruction.
Each instruction has an associated :cpp:any:`operation <migraphx::internal::operation>` which represents the computation to be performed by the instruction.
We start with a snippet of the simple ``add_two_literals()`` function::
We start with a snippet of the simple ``add_two_literals()`` function::
...
@@ -41,14 +41,14 @@ We start with a snippet of the simple ``add_two_literals()`` function::
...
@@ -41,14 +41,14 @@ We start with a snippet of the simple ``add_two_literals()`` function::
We start by creating a simple ``migraphx::program`` object and then getting a pointer to the main module of it.
We start by creating a simple :cpp:any:`migraphx::program <migraphx::internal::program>` object and then getting a pointer to the main module of it.
The program is a collection of ``modules`` that start executing from the main module, so instructions are added to the modules rather than directly onto the program object.
The program is a collection of ``modules`` that start executing from the main module, so instructions are added to the modules rather than directly onto the program object.
We then use the `add_literal <migraphx::program::add_literal>` function to add an instruction that stores the literal number ``1`` while returning an `instruction_ref <migraphx::instruction_ref>`.
We then use the :cpp:any:`add_literal <migraphx::internal::program::add_literal>` function to add an instruction that stores the literal number ``1`` while returning an :cpp:any:`instruction_ref <migraphx::internal::instruction_ref>`.
The returned `instruction_ref <migraphx::instruction_ref>` can be used in another instruction as an input.
The returned :cpp:any:`instruction_ref <migraphx::internal::instruction_ref>` can be used in another instruction as an input.
We use the same `add_literal <migraphx::program::add_literal>` function to add a ``2`` to the program.
We use the same :cpp:any:`add_literal <migraphx::internal::program::add_literal>` function to add a ``2`` to the program.
After creating the literals, we then create the instruction to add the numbers together.
After creating the literals, we then create the instruction to add the numbers together.
This is done by using the `add_instruction <migraphx::program::add_instruction>` function with the ``"add"`` `operation <migraphx::program::operation>` created by `make_op <migraphx::program::make_op>` along with the previous `add_literal` `instruction_ref <migraphx::instruction_ref>` for the input arguments of the instruction.
This is done by using the :cpp:any:`add_instruction <migraphx::internal::program::add_instruction>` function with the ``"add"`` :cpp:any:`operation <migraphx::internal::program::operation>` created by :cpp:any:`make_op <migraphx::internal::program::make_op>` along with the previous `add_literal` :cpp:any:`instruction_ref <migraphx::internal::instruction_ref>` for the input arguments of the instruction.
Finally, we can run this `program <migraphx::program>` by compiling it for the reference target (CPU) and then running it with `eval <migraphx::program::eval>`
Finally, we can run this :cpp:any:`program <migraphx::internal::program>` by compiling it for the reference target (CPU) and then running it with :cpp:any:`eval <migraphx::internal::program::eval>`
The result is then retreived and printed to the console.
The result is then retreived and printed to the console.
We can compile the program for the GPU as well, but the file will have to be moved to the ``test/gpu/`` directory and the correct target must be included::
We can compile the program for the GPU as well, but the file will have to be moved to the ``test/gpu/`` directory and the correct target must be included::
...
@@ -76,8 +76,8 @@ We can modify the program to take an input parameter ``x``, as seen in the ``add
...
@@ -76,8 +76,8 @@ We can modify the program to take an input parameter ``x``, as seen in the ``add
p.compile(migraphx::ref::target{});
p.compile(migraphx::ref::target{});
This adds a parameter of type ``int32``, and compiles it for the CPU.
This adds a parameter of type ``int32``, and compiles it for the CPU.
To run the program, we need to pass the parameter as a ``parameter_map`` when we call `eval <migraphx::program::eval>`.
To run the program, we need to pass the parameter as a ``parameter_map`` when we call :cpp:any:`eval <migraphx::internal::program::eval>`.
We create the ``parameter_map`` by setting the ``x`` key to an `argument <migraphx::argument>` object with an ``int`` data type::
We create the ``parameter_map`` by setting the ``x`` key to an :cpp:any:`argument <migraphx::internal::argument>` object with an ``int`` data type::
// create a parameter_map object for passing a value to the "x" parameter
// create a parameter_map object for passing a value to the "x" parameter
std::vector<int> data = {4};
std::vector<int> data = {4};
...
@@ -92,7 +92,7 @@ We create the ``parameter_map`` by setting the ``x`` key to an `argument <migrap
...
@@ -92,7 +92,7 @@ We create the ``parameter_map`` by setting the ``x`` key to an `argument <migrap
Handling Tensor Data
Handling Tensor Data
---------------------
---------------------
In the previous examples we have only been dealing with scalars, but the `shape <migraphx::shape>` class can describe multi-dimensional tensors.
In the previous examples we have only been dealing with scalars, but the :cpp:any:`shape <migraphx::internal::shape>` class can describe multi-dimensional tensors.
For example, we can compute a simple convolution::
For example, we can compute a simple convolution::
migraphx::program p;
migraphx::program p;
...
@@ -109,7 +109,7 @@ For example, we can compute a simple convolution::
...
@@ -109,7 +109,7 @@ For example, we can compute a simple convolution::
Here we create two parameters for both the ``input`` and ``weights``.
Here we create two parameters for both the ``input`` and ``weights``.
In the previous examples, we created simple literals, however, most programs will take data from allocated buffers (usually on the GPU).
In the previous examples, we created simple literals, however, most programs will take data from allocated buffers (usually on the GPU).
In this case, we can create `argument <migraphx::argument>` objects directly from the pointers to the buffers::
In this case, we can create :cpp:any:`argument <migraphx::internal::argument>` objects directly from the pointers to the buffers::
// Compile the program
// Compile the program
p.compile(migraphx::ref::target{});
p.compile(migraphx::ref::target{});
...
@@ -133,8 +133,8 @@ In this case, we can create `argument <migraphx::argument>` objects directly fro
...
@@ -133,8 +133,8 @@ In this case, we can create `argument <migraphx::argument>` objects directly fro
Set to "1", "enable", "enabled", "yes", or "true" to use.
Disables the conversion from fp16 to fp32 for the InstanceNormalization ONNX operator that MIGX does as a workaround for accuracy issues with reduce_mean/variance.
See ``parse_instancenorm.cpp`` for more details.
Matchers
------------
.. envvar:: MIGRAPHX_TRACE_MATCHES
Set to "1" to print the matcher that matches an instruction and the matched instruction.
Set to "2" and use the ``MIGRAPHX_TRACE_MATHCES_FOR`` flag to filter out results.
.. envvar:: MIGRAPHX_TRACE_MATCHES_FOR
Set to the name of any matcher and only traces for that matcher will be printed out.
.. envvar:: MIGRAPHX_VALIDATE_MATCHES
Set to "1", "enable", "enabled", "yes", or "true" to use.
Validate the module after finding the matches (runs ``module.validate()``).
Program Execution
---------------------
.. envvar:: MIGRAPHX_TRACE_EVAL
Set to "1", "2", or "3" to use.
"1" prints the instruction run and the time taken.
"2" prints everything in "1" and a snippet of the output argument and some statistics (ex. min, max, mean) of the output.
"3" prints everything in "1" and the full output buffers.
Program Verification
------------------------
.. envvar:: MIGRAPHX_VERIFY_ENABLE_ALLCLOSE
Set to "1", "enable", "enabled", "yes", or "true" to use.
Uses ``allclose`` with the given ``atol`` and ``rtol`` for verifying ranges with ``driver verify`` or the tests that use ``migraphx/verify.hpp``.
Pass debugging or Pass controls
-----------------------------------
.. envvar:: MIGRAPHX_TRACE_ELIMINATE_CONTIGUOUS
Set to "1", "enable", "enabled", "yes", or "true" to use.
Debug print the instructions that have input ``contiguous`` instructions removed.
.. envvar:: MIGRAPHX_DISABLE_POINTWISE_FUSION
Set to "1", "enable", "enabled", "yes", or "true" to use.
Disables the ``fuse_pointwise`` compile pass.
.. envvar:: MIGRAPHX_DEBUG_MEMORY_COLORING
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print debug statements for the ``memory_coloring`` pass.
.. envvar:: MIGRAPHX_TRACE_SCHEDULE
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print debug statements for the ``schedule`` pass.
.. envvar:: MIGRAPHX_TRACE_PROPAGATE_CONSTANT
Set to "1", "enable", "enabled", "yes", or "true" to use.
Traces instructions replaced with a constant.
.. envvar:: MIGRAPHX_8BITS_QUANTIZATION_PARAMS
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print the quantization parameters in only the main module.
Set to "1", "enable", "enabled", "yes", or "true" to use.
Disable the DNNL post ops workaround.
.. envvar:: MIGRAPHX_DISABLE_MIOPEN_FUSION
Set to "1", "enable", "enabled", "yes", or "true" to use.
Disable MIOpen fusions.
.. envvar:: MIGRAPHX_DISABLE_SCHEDULE_PASS
Set to "1", "enable", "enabled", "yes", or "true" to use.
Disable the ``schedule`` pass.
.. envvar:: MIGRAPHX_DISABLE_REDUCE_FUSION
Set to "1", "enable", "enabled", "yes", or "true" to use.
Disable the ``fuse_reduce`` pass.
.. envvar:: MIGRAPHX_ENABLE_NHWC
Set to "1", "enable", "enabled", "yes", or "true" to use.
Enable the ``layout_nhwc`` pass.
.. envvar:: MIGRAPHX_ENABLE_CK
Set to "1", "enable", "enabled", "yes", or "true" to use.
Enable using the Composable Kernels library.
Should be used in conjunction with ``MIGRAPHX_DISABLE_MLIR=1``.
.. envvar:: MIGRAPHX_DISABLE_MLIR*
Set to "1", "enable", "enabled", "yes", or "true" to use.
Disable using the rocMLIR library.
.. envvar:: MIGRAPHX_ENABLE_EXTRA_MLIR
Set to "1", "enable", "enabled", "yes", or "true" to use.
Enables additional opportunities to use MLIR that may improve performance.
.. envvar:: MIGRAPHX_COPY_LITERALS
Set to "1", "enable", "enabled", "yes", or "true" to use.
Use ``hip_copy_to_gpu`` with a new ``literal`` instruction rather than use ``hip_copy_literal{}``.
Compilation traces
----------------------
.. envvar:: MIGRAPHX_TRACE_FINALIZE
Set to "1", "enable", "enabled", "yes", or "true" to use.
Debug print instructions during the ``module.finalize()`` step.
.. envvar:: MIGRAPHX_TRACE_COMPILE
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print trace information for the graph compilation process.
.. envvar:: MIGRAPHX_TRACE_PASSES
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print the compile pass and the program after the pass.
.. envvar:: MIGRAPHX_TIME_PASSES
Set to "1", "enable", "enabled", "yes", or "true" to use.
Time the compile passes.
GPU Kernels JIT compilation debugging (applicable for both hiprtc and hipclang)
-----------------------------------------
.. envvar:: MIGRAPHX_TRACE_CMD_EXECUTE
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print commands executed by the MIGraphX ``process``.
.. envvar:: MIGRAPHX_TRACE_HIPRTC
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print HIPRTC options and C++ file executed.
.. envvar:: MIGRAPHX_DEBUG_SAVE_TEMP_DIR
Set to "1", "enable", "enabled", "yes", or "true" to use.
Make it so the created temporary directories are not deleted.
.. envvar:: MIGRAPHX_GPU_DEBUG
Set to "1", "enable", "enabled", "yes", or "true" to use.
Internally, this adds the option ``-DMIGRAPHX_DEBUG`` when compiling GPU kernels. It enables assertions and capture of source locations for the errors.
.. envvar:: MIGRAPHX_GPU_DEBUG_SYM
Set to "1", "enable", "enabled", "yes", or "true" to use.
Adds the option ``-g`` when compiling HIPRTC.
.. envvar:: MIGRAPHX_GPU_DUMP_SRC
Set to "1", "enable", "enabled", "yes", or "true" to use.
Dump the HIPRTC source files compiled.
.. envvar:: MIGRAPHX_GPU_DUMP_ASM
Set to "1", "enable", "enabled", "yes", or "true" to use.
Dump the hip-clang assembly.
.. envvar:: MIGRAPHX_GPU_OPTIMIZE
Set the optimization mode for GPU compile (``-O`` option).
Defaults to ``-O3``.
.. envvar:: MIGRAPHX_GPU_COMPILE_PARALLEL
Set to the number of threads to use.
Compile GPU code in parallel with the given number of threads.
.. envvar:: MIGRAPHX_TRACE_NARY
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print the ``nary`` device functions used.
.. envvar:: MIGRAPHX_ENABLE_HIPRTC_WORKAROUNDS
Set to "1", "enable", "enabled", "yes", or "true" to use.
Enable HIPRTC workarounds for bugs in HIPRTC.
.. envvar:: MIGRAPHX_USE_FAST_SOFTMAX
Set to "1", "enable", "enabled", "yes", or "true" to use.
Use the fast softmax optimization.
.. envvar:: MIGRAPHX_ENABLE_NULL_STREAM
Set to "1", "enable", "enabled", "yes", or "true" to use.
Allow using null stream for miopen and hipStream.
.. envvar:: MIGRAPHX_NSTREAMS
Set to the number of streams to use.
Defaults to 1.
.. envvar:: MIGRAPHX_TRACE_BENCHMARKING
Set to "1" to print benchmarching trace.
Set to "2" to print benchmarching trace with more detail.
MLIR vars
-------------
.. envvar:: MIGRAPHX_TRACE_MLIR
Set to "1" to trace MLIR and print any failures.
Set to "2" to additionally print all MLIR operations.
.. envvar:: MIGRAPHX_MLIR_USE_SPECIFIC_OPS
Set to the name of the operations you want to always use MLIR regardless of GPU architecture.
Accepts a list of operators separated by commas (ex: "fused", "convolution", "dot").
.. envvar:: MIGRAPHX_MLIR_TUNING_DB
Set to the path of the MLIR tuning database to load.
.. envvar:: MIGRAPHX_MLIR_TUNING_CFG
Set to the path of the tuning configuration.
Appends to tuning cfg file that could be used with rocMLIR tuning scripts.
.. envvar:: MIGRAPHX_MLIR_TUNE_EXHAUSTIVE
Set to "1", "enable", "enabled", "yes", or "true" to use.
Do exhaustive tuning for MLIR.
.. envvar:: MIGRAPHX_MLIR_TUNE_LIMIT
Set to an integer greater than 1.
Limits the number of solutions that MLIR will use for tuning.
CK vars
-----------
.. envvar:: MIGRAPHX_LOG_CK_GEMM
Set to "1", "enable", "enabled", "yes", or "true" to use.
Print Composable Kernels GEMM traces.
.. envvar:: MIGRAPHX_CK_DEBUG
Set to "1", "enable", "enabled", "yes", or "true" to use.
Always add the ``-DMIGRAPHX_CK_CHECK=1`` for compiling Composable Kernels operators.
.. envvar:: MIGRAPHX_TUNE_CK
Set to "1", "enable", "enabled", "yes", or "true" to use.
Use tuning for Composable Kernels.
Testing
------------
.. envvar:: MIGRAPHX_TRACE_TEST_COMPILE
Set to the target that you want to trace the compilation of (ex. "gpu", "cpu").
Prints the compile trace for the given target for the verify tests.
This flag shouldn't be used in conjunction with ``MIGRAPHX_TRACE_COMPILE``.
For the verify tests only use ``MIGRAPHX_TRACE_TEST_COMPILE``.
.. envvar:: MIGRAPHX_TRACE_TEST
Set to "1", "enable", "enabled", "yes", or "true" to use.
Prints the reference and target programs even if the verify passed successfully.
.. envvar:: MIGRAPHX_DUMP_TEST
Set to "1", "enable", "enabled", "yes", or "true" to use.
The MIGraphX driver is a tool that allows you to utilize many of the core functions of MIGraphX without having to write your own program. It can read, compile, run, and test the performance of a model with randomized data.
read
read
----
----
...
@@ -17,6 +19,7 @@ compile
...
@@ -17,6 +19,7 @@ compile
Compiles and prints input graph.
Compiles and prints input graph.
.. include:: ./driver/read.rst
.. include:: ./driver/compile.rst
.. include:: ./driver/compile.rst
run
run
...
@@ -26,6 +29,7 @@ run
...
@@ -26,6 +29,7 @@ run
Loads and prints input graph.
Loads and prints input graph.
.. include:: ./driver/read.rst
.. include:: ./driver/compile.rst
.. include:: ./driver/compile.rst
perf
perf
...
@@ -35,6 +39,7 @@ perf
...
@@ -35,6 +39,7 @@ perf
Compiles and runs input graph then prints performance report.
Compiles and runs input graph then prints performance report.
.. include:: ./driver/read.rst
.. include:: ./driver/compile.rst
.. include:: ./driver/compile.rst
.. option:: --iterations, -n [unsigned int]
.. option:: --iterations, -n [unsigned int]
...
@@ -48,6 +53,7 @@ verify
...
@@ -48,6 +53,7 @@ verify
Runs reference and CPU or GPU implementations and checks outputs for consistency.
Runs reference and CPU or GPU implementations and checks outputs for consistency.
.. include:: ./driver/read.rst
.. include:: ./driver/compile.rst
.. include:: ./driver/compile.rst
.. option:: --rms-tol [double]
.. option:: --rms-tol [double]
...
@@ -71,7 +77,7 @@ Verify each instruction
...
@@ -71,7 +77,7 @@ Verify each instruction
Reduce program and verify
Reduce program and verify
roctx
roctx
----
-----
.. program:: migraphx-driver roctx
.. program:: migraphx-driver roctx
...
@@ -86,4 +92,5 @@ An example command line combined with rocprof for tracing purposes is given belo
...
@@ -86,4 +92,5 @@ An example command line combined with rocprof for tracing purposes is given belo
After `rocprof` is run, the output directory will contain trace information for HIP, HCC and ROCTX in seperate `.txt` files.
After `rocprof` is run, the output directory will contain trace information for HIP, HCC and ROCTX in seperate `.txt` files.
To understand the interactions between API calls, it is recommended to utilize `roctx.py` helper script as desribed in :ref:`dev/tools:rocTX` section.
To understand the interactions between API calls, it is recommended to utilize `roctx.py` helper script as desribed in :ref:`dev/tools:rocTX` section.