Commits · 8e7d2efe96ad673aac65203c8a7af3f1e01964d5 · gaoqiong / MIGraphX

08 Dec, 2022 3 commits

Dynamic reference Softmax (#1475) · 8e7d2efe

Charlie Lin authored Dec 08, 2022

No major changes required, use dyn_output and pass dynamic shape when calling compute_shape()
Adds dynamic shape tests

8e7d2efe

Dynamic ref flatten (#1482) · 4c32afcc

Charlie Lin authored Dec 08, 2022

Changes flatten's compute_shape() to handle dynamic shapes
Calculates the flattened shape with the min, max, and opt

4c32afcc

fix issues with compiling lstm ops in fp16 mode (#1450) · 352c2465

shivadbhavsar authored Dec 07, 2022

Currently, quantizing a program with rnn layers to fp16 results in segmentation faults due to a "convert" operation being applied to an "undefined" instruction.

The following changes are implemented to fix this issue:

Added is_undefined method to the instruction class that returns true if all inputs to the instruction are from an undefined op.
Updated rewrite_rnn pass to use the new is_undefined method rather than checking ins->name()
Updated the dead_code_elimination pass to also use this new method rather than only checking the instruction name

352c2465

07 Dec, 2022 1 commit

Dynamic ref Argmax (#1478) · 231d60a2

Charlie Lin authored Dec 07, 2022

Extends the Argmax operator to handle dynamic input shapes.
Only shape function changes

231d60a2

06 Dec, 2022 3 commits

Add tupleVisitor for from_gpu (#1465) · a4c2b889

Ted Themistokleous authored Dec 06, 2022

Need this for when we debug and use MIGRAPHX_TRACE_EVAL() to show tuples
Without this we break when reading our buffer due to the use of visit()
This came up as part of #1283 debugging.

a4c2b889

Dynamic ref squeeze and unsqueeze (#1426) · 48cc33e4

Charlie Lin authored Dec 06, 2022

Extends unsqueeze and squeeze to work for dynamic input shapes
Does not handle the steps parameter
Adds some additional negative axes shape tests

48cc33e4

Update MLIR integration (#1451) · be70702d

jungpark-mlir authored Dec 06, 2022

Update dialect registration interface
Update 2nd build pipeline call and use full arch name

be70702d

02 Dec, 2022 2 commits

Refactor non-standard literal construction (#1443) · fdc3f00a

Charlie Lin authored Dec 02, 2022

Fix problem with the contiguous operator constructing non-standard shape literals.  A non-standard literal will almost never be used, since a literal is known at compile time.  Added some comments on the intended behavior:

- literal{shape, vector} constructor with a non-standard shape is intended to keep the same ordering as the given vector. The data buffer will be populated such that when the non-standard indexing is used the original order is as given.
- literal{shape, argument} constructor directly copies the data buffer from the argument
- Changed non-standard literal fill() to use tensor_view iterators as it handles non-standard shapes now
- Changed the contiguous ref_ops_test to be more helpful

fdc3f00a

Dynamic ref pooling (#1449) · 0e40ebaa

Charlie Lin authored Dec 02, 2022

Extends the pooling operators for dynamic shape inputs

AveragePooling
GlobalAveragePooling
MaxPooling
GlobalMaxPooling
LpNormPooling
GlobalLpNormPooling
y.github.com>

0e40ebaa

28 Nov, 2022 1 commit

Dynamic ref transpose (#1438) · 32b08891

Charlie Lin authored Nov 28, 2022

Extends ref transpose operator for dynamic shapes
Make dynamic tests more consistent naming

32b08891

17 Nov, 2022 2 commits

Fix logical_xor type checking (#1458) · af7e6eaa
Ted Themistokleous authored Nov 17, 2022
```
Fix to stop types failing for logical_xor during our fusions. 
```
af7e6eaa

Dynamic ref contiguous (#1445) · 95d82a51

Charlie Lin authored Nov 17, 2022

Extends the ref contiguous operator to handle dynamic shapes
Updates the eliminate_contiguous pass to use the dyn_output struct

95d82a51

13 Nov, 2022 1 commit

Dyn ref multibroadcast; dyn binary (#1423) · d73c6d7c

Charlie Lin authored Nov 13, 2022

Updated Multibroadcast op to have a two input version for dynamic shapes
Current dynamic shape broadcasting logic
dynamic_dimensions must be the same or one of them is {1, 1, 0} or {1, 1, 1}
Works for dyn-dyn, dyn-static, and static-static shape combinations
Changed common.cpp for multibroadcasting for binary ops with dynamic shapes
Extended binary.hpp for dynamic shapes to test the new common.cpp stuff

d73c6d7c

02 Nov, 2022 2 commits
- Add nhwc layout to gpu backend (#1391) · 1820198e
  Paul Fultz II authored Nov 02, 2022
```
Can be enabled via environment variable MIGRAPHX_ENABLE_NHWC
```
  1820198e
- Concat pointwise fusions (#1388) · 2f48b11a
  Paul Fultz II authored Nov 02, 2022
  
  2f48b11a
01 Nov, 2022 1 commit

Add opset-13 support for parse_split (#1429) · 70d0e816

Ted Themistokleous authored Nov 01, 2022

Newer split moves the split attribute to an input. In this case we check the
number of input args then.

70d0e816

28 Oct, 2022 1 commit

Use minimum block size of 64 threads (#1427) · 25a0e433

Umang Yadav authored Oct 28, 2022

Local Threads of multiples 32 were introduced in #1348
But LocalThreads that are not multiple of 64 are causing correctness issues.

25a0e433

27 Oct, 2022 2 commits

Upgrade CI environment to 5.3.0 (#1198) · 4b1c1c41

Chris Austen authored Oct 27, 2022

Upgraded Dockerfiles and fixed tidy issues to make Ubuntu 20.04 and ROCm 5.3.0 the default

4b1c1c41

Add JIT pad (#1411) · 0d841ded

kahmed10 authored Oct 27, 2022

updated GPU pad to now use JIT version.
added range functions for JIT kernels.

0d841ded

26 Oct, 2022 1 commit
- rearrange default pass list; adjust_allocation must be run after rep… (#1418) · 7b9ce460
  Brian Pickrell authored Oct 26, 2022
```
Fixes an observed regression error on certain Frozen Protobuf models due to PR 1280
```
  7b9ce460
19 Oct, 2022 2 commits

Refactor dynamic compute; Dynamic ref unary functions (#1407) · 693cb5d8

Charlie Lin authored Oct 19, 2022

Refactor dynamic compute
- add a compute_output_shape object that implicitly converts to a new dyn_output or shape object
- dyn_output object can handle computing the static output shape of an operator given the input arguments shapes
  change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to 
  use dyn_output object

Dynamic ref unary functions
-  Included these changes to have an example of the refactored dynamic compute being used
-  Changes to unary base class to handle dynamic shapes
-  Changed elu and leaky_relu to use unary base class and pointwise JIT

693cb5d8

Find2.0 changes for the Quant and De-Convolution (#1408) · 5fa42993

Umang Yadav authored Oct 19, 2022



* use find2.0 for the convolution
Co-authored-by: Vasilii Filippov <DrizztDoUrden@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

5fa42993

18 Oct, 2022 1 commit

Add support in mlir for transposed and broadcasted shaped (#1378) · c3e02b18

Paul Fultz II authored Oct 18, 2022



* Enable non-standard shape
* Use perfdb for non xdlops
* Fix transpose+broadcast strides
Co-authored-by: jungpark-mlir <jungwook.park@amd.com>

c3e02b18

17 Oct, 2022 1 commit

memset fix (#1414) · 83784c52

Umang Yadav authored Oct 17, 2022

hipMemset is causing random failure.
hipMemsetAsync is doing the correct synchronization.

83784c52

14 Oct, 2022 1 commit

Fix rank 2 batch norm (#1412) · 01d0ecfc

Charlie Lin authored Oct 14, 2022

Allows for rank 2 tensors into batchnorm.  Specifically when spatial dimensions are all 1 and removed

01d0ecfc

13 Oct, 2022 2 commits

Refactor dynamic padding mode (#1387) · 32f6388c

Charlie Lin authored Oct 13, 2022

Removes use_dynamic_same_auto_pad
Change padding_mode to be used for dynamic padding
Move compute_padded_shape to pad_calc.cpp as it will be used in other dynamic padding cases
Fix same_lower compute_padded_shape bug and add a test.

32f6388c

Rewrite TF batch norm; remove batch_norm_inference (#1371) · be309bfb

Charlie Lin authored Oct 13, 2022

Rewrites the TF batch norm like operators to other MIGX operators
Removes the code related to batch_norm_inference

be309bfb

07 Oct, 2022 1 commit

Simplify unit algebraic ops (#1281) · 4f3cc417

Ted Themistokleous authored Oct 07, 2022

Simplified algebraic operations (x*1), x*(-1), x/1, 0+x & x+0,  x-0, 0-x, 0*x, x*0, and 0/x operations

4f3cc417

04 Oct, 2022 2 commits
- Stream sync Changset (#1358) · f7d987ba
  Ted Themistokleous authored Oct 04, 2022
```
Stream sync changes and associated API level changes
```
  f7d987ba
- Fast softmax (#1290) · a9a47402
  Paul Fultz II authored Oct 04, 2022
```
optimize the softmax operator
```
  a9a47402
03 Oct, 2022 1 commit

Add output_alias and runs_on_offload_target flags for the custom ops (#1309) · c9ffb38d

Umang Yadav authored Oct 03, 2022

Adds two methods for the custom_ops virtual class.

bool runs_on_offload_target(), if the custom op runs directly on the gpu then it should be set to true. in this case, custom op expects its parameters to reside in GPU memory and writes output to the GPU memory. If it is set to false then, custom op expects it's parameter to reside on the host and puts back the result into the host memory.

output_alias, if output of the custom op is aliasing the input buffer. i.e. interpreting the same input buffer with differnet shape and strides.

Update as_vector() in C++ API to handle non-standard shapes. It required exposing element_index to space_index conversion method for the shape class.

c9ffb38d

29 Sep, 2022 1 commit

Use find_2.0 API for the convolution (#1346) · e19f78ae

Umang Yadav authored Sep 29, 2022

Improvements/Additions to be made:

changes for the quant_convolution,
changes for the deconvolution,
Macros for MIOpen status checks

e19f78ae

28 Sep, 2022 1 commit

Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960

Umang Yadav authored Sep 28, 2022

test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.

70e63960

27 Sep, 2022 1 commit
- Add onnx mod operator gpu cpu (#1306) · 40118191
  Ted Themistokleous authored Sep 26, 2022
```
Implement operator for CPU and GPU implementations
```
  40118191
26 Sep, 2022 2 commits

Rewrite ONNX parse batch norm (#1362) · c00f8202

Charlie Lin authored Sep 26, 2022

Rewrites the BatchNormalization ONNX operator into other MIGX operators
- Added handling of 1D input tensor case (edge case in ONNX spec)
Removes the spatial and per_activation functionality (not in the ONNX spec)
- Did not remove the batch_norm_inference related code as the TensorFlow parser still uses it
- Can remove that code when the TF version is updated

c00f8202

Upgrade cppcheck to 2.9 (#1400) · 66bbff1e
Paul Fultz II authored Sep 26, 2022
```
Upgrade cppcheck to 2.9 
```
66bbff1e

23 Sep, 2022 1 commit
- Remove unused device functions (#1394) · 8ea8473d
  Paul Fultz II authored Sep 23, 2022
```
* Remove device functions
* Update tests
```
  8ea8473d
21 Sep, 2022 2 commits

Parameterize epsilon for layernorm kernel (#1367) · d9578ba6

kahmed10 authored Sep 21, 2022

This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.

d9578ba6

Multibroadcast find_mul_conv (#1384) · 9a70050b

Charlie Lin authored Sep 21, 2022

Change find_mul_conv to work with multibroadcast also. Checks the strides instead of the broadcast axis.

9a70050b

19 Sep, 2022 1 commit

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d