- 20 Feb, 2023 1 commit
-
-
charlie authored
-
- 16 Feb, 2023 4 commits
-
-
charlie authored
-
Paul Fultz II authored
Avoids double global loads. Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant. Updated to handle large reductions so which results with a better stable diffusion result
-
Umang Yadav authored
* deprecate HCC
-
Umang Yadav authored
* Add driver flag "--exhaustive-tune" to enable tuning, add support for the same in C/C++ and python API
-
- 15 Feb, 2023 2 commits
- 14 Feb, 2023 2 commits
-
-
charlie authored
* Changed the allocates to occur in the submodules * Incomplete, as the use_local_alloc variable in module does not work properly * added a hip::sync_stream before the return * not sure why the hip::sync_stream gets rid of the dangling reference error (code-wise it's because hip::sync_stream's output alias is -1)
-
shivadbhavsar authored
Currently, we default to device 0 when loading programs. Updating this to use hipGetDevice to set the device for the loaded program.
-
- 10 Feb, 2023 2 commits
-
-
charlie authored
-
Umang Yadav authored
-
- 08 Feb, 2023 1 commit
-
-
charlie authored
-
- 06 Feb, 2023 2 commits
-
-
charlie authored
-
Paul Fultz II authored
* Fuse layernorm with different patterns * Only match when using the last axis Co-authored-by:
kahmed10 <15948690+kahmed10@users.noreply.github.com> Co-authored-by:
kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 31 Jan, 2023 2 commits
-
-
Umang Yadav authored
Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC. Added stages in Jenkins for hipRTC. Fixes for some of the pending issues from hipRTC.
-
Paul Fultz II authored
* Add general optimize pass * Fuse gemm multiplies by scalar * Handle zero epsilon
-
- 19 Jan, 2023 1 commit
-
-
Paul Fultz II authored
This prevents multiple adds.
-
- 17 Jan, 2023 2 commits
-
-
Paul Fultz II authored
-
Charlie Lin authored
Extends pad operator to handle dynamic input shapes Only handles computing the shape for adding constant padding to a dynamic shape - adds the padding to the min, max, and opt values (unless opt is 0, where it keeps it 0) - does not handle reflect padding with dynamic shapes
-
- 11 Jan, 2023 1 commit
-
-
Paul Fultz II authored
* Use cosine to compute half sin
-
- 09 Jan, 2023 1 commit
-
-
Ted Themistokleous authored
JIT implementation of the gather operator Added a few more unit tests to this one as well since I saw some odd behavior during bring up.
-
- 11 Dec, 2022 1 commit
-
-
Umang Yadav authored
HIP had change in previous rocm releases to use --offload-arch instead of --cuda-gpu-arch. This should be backwards compatbile. hipRTC also supports --offload-arch.
-
- 08 Dec, 2022 3 commits
-
-
Charlie Lin authored
Extends dot MIGX operator to handle dynamic input shapes Only allow dot between two dynamic shapes that have exactly matching outer dimensions Inner dimensions must also match correspondingly Updates dot related tests Change check_shapes to use shape.ndim() ONNX parsers for GEMM and MatMult will be updated in a separate PR
-
charlie authored
-
Charlie Lin authored
No major changes required, use dyn_output and pass dynamic shape when calling compute_shape() Adds dynamic shape tests
-
- 07 Dec, 2022 1 commit
-
-
Paul Fultz II authored
* Add implicit_conversion
-
- 06 Dec, 2022 2 commits
-
-
Ted Themistokleous authored
Need this for when we debug and use MIGRAPHX_TRACE_EVAL() to show tuples Without this we break when reading our buffer due to the use of visit() This came up as part of #1283 debugging.
-
jungpark-mlir authored
Update dialect registration interface Update 2nd build pipeline call and use full arch name
-
- 29 Nov, 2022 1 commit
-
-
kahmed10 authored
Merging #1391 caused an extra adjust allocation pass for GPU targets. This removes that merge error.
-
- 20 Nov, 2022 1 commit
-
-
Paul Fultz II authored
-
- 18 Nov, 2022 1 commit
-
-
Umang Yadav authored
Disabling it untill int8 fix is in mainline from MIOpen and also so that QA tests could run migraphx-driver and unittests from MIGraphX.
-
- 15 Nov, 2022 1 commit
-
-
charlie authored
-
- 07 Nov, 2022 1 commit
-
-
arvindcheru authored
-
- 06 Nov, 2022 1 commit
-
-
Umang Yadav authored
-
- 03 Nov, 2022 2 commits
-
-
charlie authored
- 02 Nov, 2022 2 commits
-
-
Paul Fultz II authored
Can be enabled via environment variable MIGRAPHX_ENABLE_NHWC
-
Paul Fultz II authored
-
- 28 Oct, 2022 1 commit
-
-
Umang Yadav authored
Local Threads of multiples 32 were introduced in #1348 But LocalThreads that are not multiple of 64 are causing correctness issues.
-
- 27 Oct, 2022 1 commit
-
-
Chris Austen authored
Upgraded Dockerfiles and fixed tidy issues to make Ubuntu 20.04 and ROCm 5.3.0 the default
-