Commits · ec1b3992e49f8ebe6d142cf4aef50011ee49a553 · gaoqiong / MIGraphX

12 Dec, 2022 13 commits

cleanup commented code · ec1b3992
Ted Themistokleous authored Dec 12, 2022

ec1b3992
Fix tidy for Axis template parameter · ca95d474
Ted Themistokleous authored Dec 12, 2022

ca95d474
Cleanup dead and debug code from gather jit implimentation · 69140d27
Ted Themistokleous authored Dec 12, 2022
```
Was debugging/ trying to figure out why indexing was incorrect. Used a bunch of
prints and such.
```
69140d27

Remove length based shape jit constructor and calculate_strides() · a402c83f

Ted Themistokleous authored Dec 12, 2022

These work in tandem to create a shape via the calculate strides() call. Seemed
to introduce more issues than fix, since we don't have access to resize()

Right now this is cleanup but I had used rev_partial_sum and the multiplies()
template operator created in algorithm to achieve this during debugging for
gather. The idea here would be we would statically create array() with calculate_strides()
to fix the empty stride dimensions.

a402c83f

Remove binary op multiply() and rev_partial_sum from algorithm.hpp · 11e7e0d0

Ted Themistokleous authored Dec 12, 2022

Was added to implement calculate strides() for array creation via shape() but
needed to use forward iterators instead since we didn't implement reverse iterators.

Removing since it's not needed but could be used still if needed.

11e7e0d0

Remove rbegin() from array.hpp · c4d73c7a

Ted Themistokleous authored Dec 12, 2022

This was added during debugging when attempting to add in partial_sum()
to replicate device behavior when using calculate_strides(). In this case
this isn't needed

c4d73c7a

Use indices instead of output for length adjustment · bd1b90a9

Ted Themistokleous authored Dec 12, 2022

This is needed to get multi(i) below to work correctly when indexing.

Originally thought this was with the output_t.

bd1b90a9

Backup of trying to get gather working · 0d13db6e

Ted Themistokleous authored Dec 09, 2022

Tried to get a proper templated shape of out_comp but right now this
seems to break as I can't just update the length of a shape and get a proper
output of the strides. Currently this breaks/asserts.

I think this is the cause of axis > 0 failing since we're not getting proper gathering
for the other axes as a result and get repeated rows with the wrong data.

0d13db6e

Backup of gather changes · 40b6e561

Ted Themistokleous authored Dec 07, 2022

Currently failing negative indices and negative axis tests.

All others "seem" to work

Noticed an oddball case that the cases that fail pass, if the sizes of a dimension
of a container is even instead of odd...

40b6e561

add multi_stride for kernel shapes. · 332a266c

Ted Themistokleous authored Dec 07, 2022

Add stride based multi-index similar to device functions. Between the device
gather and what's available for jit it looks like we're using lens instead of
strides to calculate indicies.

Seems to fix the 1d case of indices for this jit gather.

332a266c

Update gather jit op · f24e6384
Ted Themistokleous authored Dec 05, 2022
```
Pair programming with Paul
```
f24e6384
Add definition for global_stride similar to removed device function · 20d8803c
Ted Themistokleous authored Dec 05, 2022

20d8803c

Initial changes for jit gather implementation · 86ff3faf

Ted Themistokleous authored Dec 01, 2022

Taken from gatherND.cpp and modified so we include the axis parameter as opposed
to the batch_dims attribute.

Should always exist since we default this to zero when no axis is provided from
the instruction

Work in progress for the .hpp jit side.

86ff3faf

11 Dec, 2022 1 commit

change target flag (#1488) · b41c1f01

Umang Yadav authored Dec 11, 2022

HIP had change in previous rocm releases to use --offload-arch instead of --cuda-gpu-arch.

This should be backwards compatbile. hipRTC also supports --offload-arch.

b41c1f01

07 Dec, 2022 1 commit
- Fix conversion issue in layernorm fusion (#1483) · 37c3c4a9
  Paul Fultz II authored Dec 07, 2022
```
* Add implicit_conversion
```
  37c3c4a9
06 Dec, 2022 2 commits

Add tupleVisitor for from_gpu (#1465) · a4c2b889

Ted Themistokleous authored Dec 06, 2022

Need this for when we debug and use MIGRAPHX_TRACE_EVAL() to show tuples
Without this we break when reading our buffer due to the use of visit()
This came up as part of #1283 debugging.

a4c2b889

Update MLIR integration (#1451) · be70702d

jungpark-mlir authored Dec 06, 2022

Update dialect registration interface
Update 2nd build pipeline call and use full arch name

be70702d

29 Nov, 2022 1 commit

remove extra adjust allocation pass (#1477) · 5a2a83a4

kahmed10 authored Nov 30, 2022

Merging #1391 caused an extra adjust allocation pass for GPU targets. This removes that merge error.

5a2a83a4

20 Nov, 2022 1 commit
- Make a cmake variable to enable find 2.0 (#1463) · 9f50b860
  Paul Fultz II authored Nov 20, 2022
  
  9f50b860
18 Nov, 2022 1 commit

Disable Find2.0 for now (#1462) · 493bb8d5

Umang Yadav authored Nov 18, 2022

Disabling it untill int8 fix is in mainline from MIOpen and also so that QA tests could run migraphx-driver and unittests from MIGraphX.

493bb8d5

07 Nov, 2022 1 commit
- Update rocblas header include path (#1444) · df2e7635
  arvindcheru authored Nov 07, 2022
  
  df2e7635
06 Nov, 2022 1 commit
- fix overflow for workspace size (#1446) · 18234a58
  Umang Yadav authored Nov 06, 2022
  
  18234a58
02 Nov, 2022 2 commits
- Add nhwc layout to gpu backend (#1391) · 1820198e
  Paul Fultz II authored Nov 02, 2022
```
Can be enabled via environment variable MIGRAPHX_ENABLE_NHWC
```
  1820198e
- Concat pointwise fusions (#1388) · 2f48b11a
  Paul Fultz II authored Nov 02, 2022
  
  2f48b11a
28 Oct, 2022 1 commit

Use minimum block size of 64 threads (#1427) · 25a0e433

Umang Yadav authored Oct 28, 2022

Local Threads of multiples 32 were introduced in #1348
But LocalThreads that are not multiple of 64 are causing correctness issues.

25a0e433

27 Oct, 2022 2 commits

Upgrade CI environment to 5.3.0 (#1198) · 4b1c1c41

Chris Austen authored Oct 27, 2022

Upgraded Dockerfiles and fixed tidy issues to make Ubuntu 20.04 and ROCm 5.3.0 the default

4b1c1c41

Add JIT pad (#1411) · 0d841ded

kahmed10 authored Oct 27, 2022

updated GPU pad to now use JIT version.
added range functions for JIT kernels.

0d841ded

26 Oct, 2022 1 commit
- rearrange default pass list; adjust_allocation must be run after rep… (#1418) · 7b9ce460
  Brian Pickrell authored Oct 26, 2022
```
Fixes an observed regression error on certain Frozen Protobuf models due to PR 1280
```
  7b9ce460
24 Oct, 2022 1 commit

Add relaxed standard shape assertion (#1416) · f1ecad75

jungpark-mlir authored Oct 24, 2022

Reiterate the assertion on the standard shape but relax it for the multibroadcast ops deliberately inserted to explicit the broadcast.

f1ecad75

19 Oct, 2022 2 commits

Refactor dynamic compute; Dynamic ref unary functions (#1407) · 693cb5d8

Charlie Lin authored Oct 19, 2022

Refactor dynamic compute
- add a compute_output_shape object that implicitly converts to a new dyn_output or shape object
- dyn_output object can handle computing the static output shape of an operator given the input arguments shapes
  change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to 
  use dyn_output object

Dynamic ref unary functions
-  Included these changes to have an example of the refactored dynamic compute being used
-  Changes to unary base class to handle dynamic shapes
-  Changed elu and leaky_relu to use unary base class and pointwise JIT

693cb5d8

Find2.0 changes for the Quant and De-Convolution (#1408) · 5fa42993

Umang Yadav authored Oct 19, 2022



* use find2.0 for the convolution
Co-authored-by: Vasilii Filippov <DrizztDoUrden@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

5fa42993

18 Oct, 2022 1 commit

Add support in mlir for transposed and broadcasted shaped (#1378) · c3e02b18

Paul Fultz II authored Oct 18, 2022



* Enable non-standard shape
* Use perfdb for non xdlops
* Fix transpose+broadcast strides
Co-authored-by: jungpark-mlir <jungwook.park@amd.com>

c3e02b18

13 Oct, 2022 1 commit

Rewrite TF batch norm; remove batch_norm_inference (#1371) · be309bfb

Charlie Lin authored Oct 13, 2022

Rewrites the TF batch norm like operators to other MIGX operators
Removes the code related to batch_norm_inference

be309bfb

04 Oct, 2022 2 commits
- Stream sync Changset (#1358) · f7d987ba
  Ted Themistokleous authored Oct 04, 2022
```
Stream sync changes and associated API level changes
```
  f7d987ba
- Fast softmax (#1290) · a9a47402
  Paul Fultz II authored Oct 04, 2022
```
optimize the softmax operator
```
  a9a47402
03 Oct, 2022 1 commit

Add output_alias and runs_on_offload_target flags for the custom ops (#1309) · c9ffb38d

Umang Yadav authored Oct 03, 2022

Adds two methods for the custom_ops virtual class.

bool runs_on_offload_target(), if the custom op runs directly on the gpu then it should be set to true. in this case, custom op expects its parameters to reside in GPU memory and writes output to the GPU memory. If it is set to false then, custom op expects it's parameter to reside on the host and puts back the result into the host memory.

output_alias, if output of the custom op is aliasing the input buffer. i.e. interpreting the same input buffer with differnet shape and strides.

Update as_vector() in C++ API to handle non-standard shapes. It required exposing element_index to space_index conversion method for the shape class.

c9ffb38d

29 Sep, 2022 1 commit

Use find_2.0 API for the convolution (#1346) · e19f78ae

Umang Yadav authored Sep 29, 2022

Improvements/Additions to be made:

changes for the quant_convolution,
changes for the deconvolution,
Macros for MIOpen status checks

e19f78ae

28 Sep, 2022 1 commit

Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960

Umang Yadav authored Sep 28, 2022

test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.

70e63960

27 Sep, 2022 1 commit
- Add onnx mod operator gpu cpu (#1306) · 40118191
  Ted Themistokleous authored Sep 26, 2022
```
Implement operator for CPU and GPU implementations
```
  40118191
26 Sep, 2022 1 commit
- Use larger vector size instead of preloading for broadcasted inputs (#1389) · 492c4a6c
  Paul Fultz II authored Sep 26, 2022
  
  492c4a6c