- 12 Dec, 2022 13 commits
-
-
Ted Themistokleous authored
-
Ted Themistokleous authored
-
Ted Themistokleous authored
Was debugging/ trying to figure out why indexing was incorrect. Used a bunch of prints and such.
-
Ted Themistokleous authored
These work in tandem to create a shape via the calculate strides() call. Seemed to introduce more issues than fix, since we don't have access to resize() Right now this is cleanup but I had used rev_partial_sum and the multiplies() template operator created in algorithm to achieve this during debugging for gather. The idea here would be we would statically create array() with calculate_strides() to fix the empty stride dimensions.
-
Ted Themistokleous authored
Was added to implement calculate strides() for array creation via shape() but needed to use forward iterators instead since we didn't implement reverse iterators. Removing since it's not needed but could be used still if needed.
-
Ted Themistokleous authored
This was added during debugging when attempting to add in partial_sum() to replicate device behavior when using calculate_strides(). In this case this isn't needed
-
Ted Themistokleous authored
This is needed to get multi(i) below to work correctly when indexing. Originally thought this was with the output_t.
-
Ted Themistokleous authored
Tried to get a proper templated shape of out_comp but right now this seems to break as I can't just update the length of a shape and get a proper output of the strides. Currently this breaks/asserts. I think this is the cause of axis > 0 failing since we're not getting proper gathering for the other axes as a result and get repeated rows with the wrong data.
-
Ted Themistokleous authored
Currently failing negative indices and negative axis tests. All others "seem" to work Noticed an oddball case that the cases that fail pass, if the sizes of a dimension of a container is even instead of odd...
-
Ted Themistokleous authored
Add stride based multi-index similar to device functions. Between the device gather and what's available for jit it looks like we're using lens instead of strides to calculate indicies. Seems to fix the 1d case of indices for this jit gather.
-
Ted Themistokleous authored
Pair programming with Paul
-
Ted Themistokleous authored
-
Ted Themistokleous authored
Taken from gatherND.cpp and modified so we include the axis parameter as opposed to the batch_dims attribute. Should always exist since we default this to zero when no axis is provided from the instruction Work in progress for the .hpp jit side.
-
- 11 Dec, 2022 1 commit
-
-
Umang Yadav authored
HIP had change in previous rocm releases to use --offload-arch instead of --cuda-gpu-arch. This should be backwards compatbile. hipRTC also supports --offload-arch.
-
- 07 Dec, 2022 1 commit
-
-
Paul Fultz II authored
* Add implicit_conversion
-
- 06 Dec, 2022 2 commits
-
-
Ted Themistokleous authored
Need this for when we debug and use MIGRAPHX_TRACE_EVAL() to show tuples Without this we break when reading our buffer due to the use of visit() This came up as part of #1283 debugging.
-
jungpark-mlir authored
Update dialect registration interface Update 2nd build pipeline call and use full arch name
-
- 29 Nov, 2022 1 commit
-
-
kahmed10 authored
Merging #1391 caused an extra adjust allocation pass for GPU targets. This removes that merge error.
-
- 20 Nov, 2022 1 commit
-
-
Paul Fultz II authored
-
- 18 Nov, 2022 1 commit
-
-
Umang Yadav authored
Disabling it untill int8 fix is in mainline from MIOpen and also so that QA tests could run migraphx-driver and unittests from MIGraphX.
-
- 07 Nov, 2022 1 commit
-
-
arvindcheru authored
-
- 06 Nov, 2022 1 commit
-
-
Umang Yadav authored
-
- 02 Nov, 2022 2 commits
-
-
Paul Fultz II authored
Can be enabled via environment variable MIGRAPHX_ENABLE_NHWC
-
Paul Fultz II authored
-
- 28 Oct, 2022 1 commit
-
-
Umang Yadav authored
Local Threads of multiples 32 were introduced in #1348 But LocalThreads that are not multiple of 64 are causing correctness issues.
-
- 27 Oct, 2022 2 commits
-
-
Chris Austen authored
Upgraded Dockerfiles and fixed tidy issues to make Ubuntu 20.04 and ROCm 5.3.0 the default
-
kahmed10 authored
updated GPU pad to now use JIT version. added range functions for JIT kernels.
-
- 26 Oct, 2022 1 commit
-
-
Brian Pickrell authored
Fixes an observed regression error on certain Frozen Protobuf models due to PR 1280
-
- 24 Oct, 2022 1 commit
-
-
jungpark-mlir authored
Reiterate the assertion on the standard shape but relax it for the multibroadcast ops deliberately inserted to explicit the broadcast.
-
- 19 Oct, 2022 2 commits
-
-
Charlie Lin authored
Refactor dynamic compute - add a compute_output_shape object that implicitly converts to a new dyn_output or shape object - dyn_output object can handle computing the static output shape of an operator given the input arguments shapes change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to use dyn_output object Dynamic ref unary functions - Included these changes to have an example of the refactored dynamic compute being used - Changes to unary base class to handle dynamic shapes - Changed elu and leaky_relu to use unary base class and pointwise JIT
-
Umang Yadav authored
* use find2.0 for the convolution Co-authored-by:
Vasilii Filippov <DrizztDoUrden@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 18 Oct, 2022 1 commit
-
-
Paul Fultz II authored
* Enable non-standard shape * Use perfdb for non xdlops * Fix transpose+broadcast strides Co-authored-by:jungpark-mlir <jungwook.park@amd.com>
-
- 13 Oct, 2022 1 commit
-
-
Charlie Lin authored
Rewrites the TF batch norm like operators to other MIGX operators Removes the code related to batch_norm_inference
-
- 04 Oct, 2022 2 commits
-
-
Ted Themistokleous authored
Stream sync changes and associated API level changes
-
Paul Fultz II authored
optimize the softmax operator
-
- 03 Oct, 2022 1 commit
-
-
Umang Yadav authored
Adds two methods for the custom_ops virtual class. bool runs_on_offload_target(), if the custom op runs directly on the gpu then it should be set to true. in this case, custom op expects its parameters to reside in GPU memory and writes output to the GPU memory. If it is set to false then, custom op expects it's parameter to reside on the host and puts back the result into the host memory. output_alias, if output of the custom op is aliasing the input buffer. i.e. interpreting the same input buffer with differnet shape and strides. Update as_vector() in C++ API to handle non-standard shapes. It required exposing element_index to space_index conversion method for the shape class.
-
- 29 Sep, 2022 1 commit
-
-
Umang Yadav authored
Improvements/Additions to be made: changes for the quant_convolution, changes for the deconvolution, Macros for MIOpen status checks
-
- 28 Sep, 2022 1 commit
-
-
Umang Yadav authored
test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.
-
- 27 Sep, 2022 1 commit
-
-
Ted Themistokleous authored
Implement operator for CPU and GPU implementations
-
- 26 Sep, 2022 1 commit
-
-
Paul Fultz II authored
-