Commits · f33f2298f72a97bb495c6cd60446ec92889b7333 · gaoqiong / MIGraphX

28 Jul, 2023 1 commit

Improve performance of pointwise/reduction kernels when using NHWC layouts (#1955) · f33f2298

Paul Fultz II authored Jul 28, 2023

* Improve performance of pointwise/reduction kernels when using NHWC layouts

* Format

* Add nhwc test

* Format

* Remove inline namespace

* Add reduce test

f33f2298

22 Jul, 2023 1 commit
- Throw on calling `shape.lens()` or `shape.strides()` on a dynamic shape and vice versa (#1937) · 4e24e65a
  Charlie Lin authored Jul 22, 2023
```
Throwing on these calls catches dynamic shape errors earlier rather than having to backpedal from a bad call
```
  4e24e65a
13 Jul, 2023 1 commit

Update deconvolution -> convolution_backwards and Dynamic Shape Support (#1801) · 4edf1195

Charlie Lin authored Jul 13, 2023

Renames deconvolution -> convolution_backwards to be more consistent with the literature
Note: this is not the cross-correlation operator (which is the adjoint of convolution). This is technically a standard convolution operator combined with an upsampling operator rather than a downsampling operator.
Adds unit tests for the padding, strides, dilations, and other op attributes.
Throws on auto_pad attribute since it has not been implemented
Previously it read the attribute and set it but then did nothing with it
Extended for dynamic shapes
Does not support using asymmetric padding (padding_L != padding_R) and output_shape with dynamic shapes.

4edf1195

10 Jul, 2023 1 commit

Pooling op. calculation changes (#1823) · bb06dbf5

Brian Pickrell authored Jul 09, 2023

Changes to the way Pooling operation calculates pooling when there's padding. Old code would clip off any padding values before computing; for instance if an Average pooling window contained 0 1 2 where the 0 is padding, the result was 1.5 instead of 1.0. See Issue 1766

bb06dbf5

08 Jul, 2023 1 commit
- bump up CMake minimum version required to 3.15 (#1888) · 0e144f05
  Artur Wojcik authored Jul 09, 2023
  
  0e144f05
06 Jul, 2023 1 commit
- fix compilation warnings causing build failures (-Werror) (#1889) · a83371ca
  Artur Wojcik authored Jul 07, 2023
  
  a83371ca
02 Jul, 2023 1 commit

Improvement to ck integration (#1859) · 3c9df3b4

Paul Fultz II authored Jul 02, 2023

Add a CI job to test CK
Add MIGRAPHX_TUNE_CK env variable to only do tuning for CK
Continue tuning even when there is invalid configs
Fix a bug with parallel compilation not using all available threads
Add additional test for gemms using half types
Removed int32 as supported type since it doesnt pass our test suite

3c9df3b4

23 Jun, 2023 1 commit
- Remove clamping for converts (#1853) · e794a63c
  Umang Yadav authored Jun 23, 2023
```
Fixes #1852  Fixes #1847
```
  e794a63c
01 Jun, 2023 1 commit

Convert Fp16 instance-norm to FP32 temporarily (#1779) · 49b341d3

Umang Yadav authored Jun 01, 2023

By converting to fp32 : fp16 3d-unet model accuracy comes out the same as FP32 accuracy.

By using reduce_sum method on Fp16 : accuracy comes out ~0.9% lower compared to fp32 while keeping entire model in fp16.

49b341d3

25 May, 2023 1 commit
- Update cpp generator to handle inf from float (#1758) · 763dd1da
  Ted Themistokleous authored May 25, 2023
```
Use std::numeric_limits::min/max() functions plus the appropriate value to encode -inf/inf 
```
  763dd1da
20 May, 2023 1 commit
- Use half HIP APIs to compute max and min (#1764) · 88fb551c
  Umang Yadav authored May 19, 2023
```
* use half hip functions to compute max and min
* add verify test for min and max
```
  88fb551c
04 May, 2023 1 commit

Rewrite multiplies with dot operator (#1685) · 457703a8

Paul Fultz II authored May 04, 2023

When multiplying either the input or output across the K dimensions then the multiple can be applied to the constant which can then be folded with propagate_const.

457703a8

28 Apr, 2023 1 commit
- Removed split_single_dyn_dim compile flag (#1711) · bcc1f64a
  Charlie Lin authored Apr 28, 2023
  
  bcc1f64a
24 Apr, 2023 2 commits
- Fix compile failure in reduction fusion of instance norm (#1702) · 08360e83
  Paul Fultz II authored Apr 24, 2023
```
This fixes #1700
```
  08360e83
- Fix incorrect assertion in vec_packed_at (#1704) · 4339af75
  Paul Fultz II authored Apr 23, 2023
  
  4339af75
07 Apr, 2023 1 commit

Require the same type for the inputs and scales for QuantizeLinear (#1642) · f6e22d56

Paul Fultz II authored Apr 06, 2023

Converts can be inserted when the scales and input differ in the onnx file(we are already doing this implicit conversion in the ref implementation). This will also improve the compile-time of quantizelinear.hpp since we can remove the nested visit method.

f6e22d56

05 Apr, 2023 1 commit

Optimize add convolution (#1549) · df32040d

Paul Fultz II authored Apr 05, 2023

This will replace conv(x+a, w) with conv(x, w) + conv(a, w) where a is a constant so conv(a, w) can be replaced with a constant.

df32040d

31 Mar, 2023 1 commit

Split single dynamic dimension compiler pass (#1580) · e9e3eacc

Charlie Lin authored Mar 30, 2023

Adds a new GPU compiler pass split_single_dyn_dim that handles when one input parameter has a single non-fixed dynamic_dimension.
commonly occurs for dynamic batch or BERT sequence length
Splits the dynamic shape into several submodules will static input parameters to handle all of the cases in the dynamic_dimension range.
Essentially does what I manually did for the select_module verify tests
Adds a compile option split_single_dyn_dim that toggles the pass on/off. Defaults to false.
Updates verify_program.hpp and run_verify.cpp to allow for the tests to change the compile_options

e9e3eacc

29 Mar, 2023 1 commit
- Fix bug when concatting with the vectorization axis (#1653) · b1506c73
  Paul Fultz II authored Mar 29, 2023
  
  b1506c73
21 Mar, 2023 1 commit

select_module refactor (#1615) · 94a7f6ee

Charlie Lin authored Mar 21, 2023

Refactor to have select_module use output parameters
Disable select_module verify tests on cpu

94a7f6ee

18 Mar, 2023 1 commit
- Dynamically plug-in backend target libs (#1608) · 7a7040aa
  Umang Yadav authored Mar 18, 2023
```
Fixes #1595
```
  7a7040aa
17 Mar, 2023 2 commits
- Remove test_gather_literal_inputs test (#1628) · 9ef6801e
  Paul Fultz II authored Mar 17, 2023
  
  9ef6801e
- Fold const on last instruction (#1626) · 450c5e84
  Paul Fultz II authored Mar 17, 2023
```
This is the original testcase that sparked the error with missing proper const
folding. Pushing changes up to this branch and closing out the PR #1622
```
  450c5e84
10 Mar, 2023 2 commits
- Fix make_inner_storage function (#1607) · 5e132673
  Paul Fultz II authored Mar 10, 2023
  
  5e132673
- Fix static_assert in large reduction (#1604) · 206b9a51
  Paul Fultz II authored Mar 09, 2023
  
  206b9a51
28 Feb, 2023 1 commit

Select module op (#1569) · a63ee2e0

Charlie Lin authored Feb 28, 2023

Creates the select_module operator that selects one of the submodules passed to it to run based on the submodule parameters.  The submodule is selected by having the exact same static shapes for the arguments to select_module as the parameters in the submodule

a63ee2e0

23 Feb, 2023 1 commit
- Modify layernorm to allow higher overflow limit for lower precision (#1534) · 3c67e66f
  shivadbhavsar authored Feb 22, 2023
  
  3c67e66f
16 Feb, 2023 1 commit

Copy into registers first when doing reductions with layernorm and softmax (#1489) · ac531d99

Paul Fultz II authored Feb 16, 2023

Avoids double global loads.  Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant.   Updated to handle large reductions so which results with a better stable diffusion result

ac531d99

17 Jan, 2023 1 commit
- Use float accumulator when reduction size is too large for half (#1515) · 3af50e07
  Paul Fultz II authored Jan 17, 2023
  
  3af50e07
13 Jan, 2023 1 commit
- Transpose slice fix (#1499) · 2c8149f6
  shivadbhavsar authored Jan 13, 2023
```
This PR resolves the bug addressed in #1496. 
```
  2c8149f6
11 Jan, 2023 1 commit
- Use cosine to compute half sin (#1508) · 3fb5c0ef
  Paul Fultz II authored Jan 11, 2023
```
* Use cosine to compute half sin
```
  3fb5c0ef
09 Jan, 2023 1 commit

Add JIT Gather Operator (#1492) · 054364cd

Ted Themistokleous authored Jan 09, 2023

JIT implementation of the gather operator
Added a few more unit tests to this one as well since I saw some odd behavior during bring up.

054364cd

02 Nov, 2022 1 commit
- Concat pointwise fusions (#1388) · 2f48b11a
  Paul Fultz II authored Nov 02, 2022
  
  2f48b11a
28 Oct, 2022 1 commit

Use minimum block size of 64 threads (#1427) · 25a0e433

Umang Yadav authored Oct 28, 2022

Local Threads of multiples 32 were introduced in #1348
But LocalThreads that are not multiple of 64 are causing correctness issues.

25a0e433

27 Oct, 2022 1 commit

Add JIT pad (#1411) · 0d841ded

kahmed10 authored Oct 27, 2022

updated GPU pad to now use JIT version.
added range functions for JIT kernels.

0d841ded

26 Oct, 2022 1 commit
- rearrange default pass list; adjust_allocation must be run after rep… (#1418) · 7b9ce460
  Brian Pickrell authored Oct 26, 2022
```
Fixes an observed regression error on certain Frozen Protobuf models due to PR 1280
```
  7b9ce460
19 Oct, 2022 2 commits

Refactor dynamic compute; Dynamic ref unary functions (#1407) · 693cb5d8

Charlie Lin authored Oct 19, 2022

Refactor dynamic compute
- add a compute_output_shape object that implicitly converts to a new dyn_output or shape object
- dyn_output object can handle computing the static output shape of an operator given the input arguments shapes
  change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to 
  use dyn_output object

Dynamic ref unary functions
-  Included these changes to have an example of the refactored dynamic compute being used
-  Changes to unary base class to handle dynamic shapes
-  Changed elu and leaky_relu to use unary base class and pointwise JIT

693cb5d8

Find2.0 changes for the Quant and De-Convolution (#1408) · 5fa42993

Umang Yadav authored Oct 19, 2022



* use find2.0 for the convolution
Co-authored-by: Vasilii Filippov <DrizztDoUrden@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

5fa42993

13 Oct, 2022 2 commits

Refactor dynamic padding mode (#1387) · 32f6388c

Charlie Lin authored Oct 13, 2022

Removes use_dynamic_same_auto_pad
Change padding_mode to be used for dynamic padding
Move compute_padded_shape to pad_calc.cpp as it will be used in other dynamic padding cases
Fix same_lower compute_padded_shape bug and add a test.

32f6388c

Rewrite TF batch norm; remove batch_norm_inference (#1371) · be309bfb

Charlie Lin authored Oct 13, 2022

Rewrites the TF batch norm like operators to other MIGX operators
Removes the code related to batch_norm_inference

be309bfb