- 25 Apr, 2023 1 commit
-
-
Chris Austen authored
-
- 24 Apr, 2023 3 commits
-
-
Charlie Lin authored
Updates the hip::copy_to_gpu and hip::copy_from_gpu operators to work with dynamic shapes Allows for offload_copy to be used with dynamic batch Changed assert in select_module because the argument might now be smaller with how offload_copy will work with dynamic batch. (maximum buffer size will be used)
-
Paul Fultz II authored
This fixes #1700
-
Paul Fultz II authored
-
- 21 Apr, 2023 1 commit
-
-
Umang Yadav authored
-
- 13 Apr, 2023 1 commit
-
-
Zhuoran Yin authored
-
- 11 Apr, 2023 1 commit
-
-
Paul Fultz II authored
-
- 09 Apr, 2023 1 commit
-
-
Paul Fultz II authored
* Enable hiprtc by default
-
- 06 Apr, 2023 2 commits
-
-
Charlie Lin authored
Examples.. bin/driver verify /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim @data "[{min:1, max:4}, 3, 224, 224]" bin/driver compile /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --default-dyn-dim "{min:1, max:10}" --output resnet50_batch1-10.mxr bin/driver perf resnet50_batch1-10.mxr --batch 4 -
Paul Fultz II authored
Automatically fuse multiple reductions and pointwise operations.
-
- 05 Apr, 2023 1 commit
-
-
Paul Fultz II authored
This will replace conv(x+a, w) with conv(x, w) + conv(a, w) where a is a constant so conv(a, w) can be replaced with a constant.
-
- 03 Apr, 2023 1 commit
-
-
Charlie Lin authored
Adds the promote_literals compiler pass that moves literals from the submodules to the main module. With the eliminate_common_subexpression pass, it will remove copies of literals created during split_single_dyn_dim. Pass is enabled with the split_single_dyn_dim compile option.
-
- 31 Mar, 2023 1 commit
-
-
Charlie Lin authored
Adds a new GPU compiler pass split_single_dyn_dim that handles when one input parameter has a single non-fixed dynamic_dimension. commonly occurs for dynamic batch or BERT sequence length Splits the dynamic shape into several submodules will static input parameters to handle all of the cases in the dynamic_dimension range. Essentially does what I manually did for the select_module verify tests Adds a compile option split_single_dyn_dim that toggles the pass on/off. Defaults to false. Updates verify_program.hpp and run_verify.cpp to allow for the tests to change the compile_options
-
- 30 Mar, 2023 1 commit
-
-
Paul Fultz II authored
* Add hiprtc driver
-
- 29 Mar, 2023 1 commit
-
-
Paul Fultz II authored
-
- 28 Mar, 2023 1 commit
-
-
Umang Yadav authored
* Remove version from check_context and bump program version
-
- 27 Mar, 2023 1 commit
-
-
Manupa Karunaratne authored
* [MLIR] add dot offloads with manual tuning support * This commit adds dot + pointwise fusion support along with manual tuning using rocMLIR.
-
- 25 Mar, 2023 1 commit
-
-
Umang Yadav authored
Co-authored-by:Chris Austen <causten@users.noreply.github.com>
-
- 21 Mar, 2023 1 commit
-
-
Charlie Lin authored
Refactor to have select_module use output parameters Disable select_module verify tests on cpu
-
- 18 Mar, 2023 1 commit
-
-
Umang Yadav authored
Fixes #1595
-
- 13 Mar, 2023 1 commit
-
-
Manupa Karunaratne authored
* [MLIR] Adds a runtime switch to trigger MLIR
-
- 10 Mar, 2023 2 commits
-
-
Paul Fultz II authored
-
Paul Fultz II authored
-
- 01 Mar, 2023 1 commit
-
-
Charlie Lin authored
Add additional documentation to explain the passes.
-
- 28 Feb, 2023 1 commit
-
-
Charlie Lin authored
Creates the select_module operator that selects one of the submodules passed to it to run based on the submodule parameters. The submodule is selected by having the exact same static shapes for the arguments to select_module as the parameters in the submodule
-
- 23 Feb, 2023 1 commit
-
-
shivadbhavsar authored
-
- 16 Feb, 2023 3 commits
-
-
Paul Fultz II authored
Avoids double global loads. Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant. Updated to handle large reductions so which results with a better stable diffusion result
-
Umang Yadav authored
* deprecate HCC
-
Umang Yadav authored
* Add driver flag "--exhaustive-tune" to enable tuning, add support for the same in C/C++ and python API
-
- 14 Feb, 2023 1 commit
-
-
shivadbhavsar authored
Currently, we default to device 0 when loading programs. Updating this to use hipGetDevice to set the device for the loaded program.
-
- 10 Feb, 2023 1 commit
-
-
Umang Yadav authored
-
- 06 Feb, 2023 1 commit
-
-
Paul Fultz II authored
* Fuse layernorm with different patterns * Only match when using the last axis Co-authored-by:
kahmed10 <15948690+kahmed10@users.noreply.github.com> Co-authored-by:
kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 31 Jan, 2023 2 commits
-
-
Umang Yadav authored
Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC. Added stages in Jenkins for hipRTC. Fixes for some of the pending issues from hipRTC.
-
Paul Fultz II authored
* Add general optimize pass * Fuse gemm multiplies by scalar * Handle zero epsilon
-
- 19 Jan, 2023 1 commit
-
-
Paul Fultz II authored
This prevents multiple adds.
-
- 17 Jan, 2023 1 commit
-
-
Paul Fultz II authored
-
- 11 Jan, 2023 1 commit
-
-
Paul Fultz II authored
* Use cosine to compute half sin
-
- 09 Jan, 2023 1 commit
-
-
Ted Themistokleous authored
JIT implementation of the gather operator Added a few more unit tests to this one as well since I saw some odd behavior during bring up.
-
- 11 Dec, 2022 1 commit
-
-
Umang Yadav authored
HIP had change in previous rocm releases to use --offload-arch instead of --cuda-gpu-arch. This should be backwards compatbile. hipRTC also supports --offload-arch.
-
- 07 Dec, 2022 1 commit
-
-
Paul Fultz II authored
* Add implicit_conversion
-