Commits · 5af9aac00a02daa1f2c351bcdc0063e00f30d89a · gaoqiong / MIGraphX

20 Feb, 2023 1 commit
- Tidy fixes · 5de36e4a
  charlie authored Feb 20, 2023
  
  5de36e4a
16 Feb, 2023 1 commit

Copy into registers first when doing reductions with layernorm and softmax (#1489) · ac531d99

Paul Fultz II authored Feb 16, 2023

Avoids double global loads.  Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant.   Updated to handle large reductions so which results with a better stable diffusion result

ac531d99

15 Feb, 2023 1 commit
- Add another test · e833a916
  charlie authored Feb 15, 2023
  
  e833a916
14 Feb, 2023 1 commit

Somehow this verify test works · 996426be

charlie authored Feb 14, 2023

* Changed the allocates to occur in the submodules
  * Incomplete, as the use_local_alloc variable in module does not work
  properly
* added a hip::sync_stream before the return
* not sure why the hip::sync_stream gets rid of the dangling reference
error (code-wise it's because hip::sync_stream's output alias is -1)

996426be

08 Feb, 2023 1 commit
- progress on gpu version · b8ebf8ad
  charlie authored Feb 08, 2023
  
  b8ebf8ad
17 Jan, 2023 1 commit
- Use float accumulator when reduction size is too large for half (#1515) · 3af50e07
  Paul Fultz II authored Jan 17, 2023
  
  3af50e07
13 Jan, 2023 1 commit
- Transpose slice fix (#1499) · 2c8149f6
  shivadbhavsar authored Jan 13, 2023
```
This PR resolves the bug addressed in #1496. 
```
  2c8149f6
11 Jan, 2023 1 commit
- Use cosine to compute half sin (#1508) · 3fb5c0ef
  Paul Fultz II authored Jan 11, 2023
```
* Use cosine to compute half sin
```
  3fb5c0ef
09 Jan, 2023 1 commit

Add JIT Gather Operator (#1492) · 054364cd

Ted Themistokleous authored Jan 09, 2023

JIT implementation of the gather operator
Added a few more unit tests to this one as well since I saw some odd behavior during bring up.

054364cd

02 Nov, 2022 1 commit
- Concat pointwise fusions (#1388) · 2f48b11a
  Paul Fultz II authored Nov 02, 2022
  
  2f48b11a
28 Oct, 2022 1 commit

Use minimum block size of 64 threads (#1427) · 25a0e433

Umang Yadav authored Oct 28, 2022

Local Threads of multiples 32 were introduced in #1348
But LocalThreads that are not multiple of 64 are causing correctness issues.

25a0e433

27 Oct, 2022 1 commit

Add JIT pad (#1411) · 0d841ded

kahmed10 authored Oct 27, 2022

updated GPU pad to now use JIT version.
added range functions for JIT kernels.

0d841ded

26 Oct, 2022 1 commit
- rearrange default pass list; adjust_allocation must be run after rep… (#1418) · 7b9ce460
  Brian Pickrell authored Oct 26, 2022
```
Fixes an observed regression error on certain Frozen Protobuf models due to PR 1280
```
  7b9ce460
19 Oct, 2022 2 commits

Refactor dynamic compute; Dynamic ref unary functions (#1407) · 693cb5d8

Charlie Lin authored Oct 19, 2022

Refactor dynamic compute
- add a compute_output_shape object that implicitly converts to a new dyn_output or shape object
- dyn_output object can handle computing the static output shape of an operator given the input arguments shapes
  change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to 
  use dyn_output object

Dynamic ref unary functions
-  Included these changes to have an example of the refactored dynamic compute being used
-  Changes to unary base class to handle dynamic shapes
-  Changed elu and leaky_relu to use unary base class and pointwise JIT

693cb5d8

Find2.0 changes for the Quant and De-Convolution (#1408) · 5fa42993

Umang Yadav authored Oct 19, 2022



* use find2.0 for the convolution
Co-authored-by: Vasilii Filippov <DrizztDoUrden@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

5fa42993

13 Oct, 2022 2 commits

Refactor dynamic padding mode (#1387) · 32f6388c

Charlie Lin authored Oct 13, 2022

Removes use_dynamic_same_auto_pad
Change padding_mode to be used for dynamic padding
Move compute_padded_shape to pad_calc.cpp as it will be used in other dynamic padding cases
Fix same_lower compute_padded_shape bug and add a test.

32f6388c

Rewrite TF batch norm; remove batch_norm_inference (#1371) · be309bfb

Charlie Lin authored Oct 13, 2022

Rewrites the TF batch norm like operators to other MIGX operators
Removes the code related to batch_norm_inference

be309bfb

10 Oct, 2022 1 commit
- Remove leaku_relu and elu verify tests · 8c4ae897
  charlie authored Oct 10, 2022
  
  8c4ae897
04 Oct, 2022 1 commit
- Fast softmax (#1290) · a9a47402
  Paul Fultz II authored Oct 04, 2022
```
optimize the softmax operator
```
  a9a47402
29 Sep, 2022 2 commits
- Fix elu and leaky_relu pointwise JIT · 48c7c810
  charlie authored Sep 29, 2022
  
  48c7c810
- Fix test typo? · 5793740d
  charlie authored Sep 29, 2022
  
  5793740d
27 Sep, 2022 1 commit
- Add onnx mod operator gpu cpu (#1306) · 40118191
  Ted Themistokleous authored Sep 26, 2022
```
Implement operator for CPU and GPU implementations
```
  40118191
21 Sep, 2022 1 commit

Parameterize epsilon for layernorm kernel (#1367) · d9578ba6

kahmed10 authored Sep 21, 2022

This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.

d9578ba6

19 Sep, 2022 1 commit

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d

16 Sep, 2022 1 commit
- Progress on changing padding_mode · 0afab294
  charlie authored Sep 16, 2022
```
Weird bug with ref padding shape
still need to change parse_convolution
```
  0afab294
14 Sep, 2022 2 commits
- Reduce problem size of unbatched_gemm tests (#1383) · 333860ce
  turneram authored Sep 14, 2022
```
The verify tests from pr #1354 were still causing some codecov timeouts after merge. This PR further reduces the problem sizes to avoid these failures.
```
  333860ce
- Implement concat using jit compilation (#1356) · 7662d9c0
  Paul Fultz II authored Sep 14, 2022
```
* Implement concat using jit compilation
```
  7662d9c0
13 Sep, 2022 1 commit

Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) · a10a8ef1

turneram authored Sep 13, 2022

Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension.
The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes.

Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.

a10a8ef1

07 Sep, 2022 1 commit
- Fix accuracy bug when vectorizing slices (#1364) · 60aa0e48
  Paul Fultz II authored Sep 06, 2022
```
* Fix accuracy bug when vectorizing slices
```
  60aa0e48
06 Sep, 2022 1 commit
- Enable cppcheck rule for 'not', 'or' keywords (#1361) · d37a4df9
  Paul Fultz II authored Sep 06, 2022
```
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
```
  d37a4df9
31 Aug, 2022 1 commit

Add pass to rewrite gelu as fast gelu (#1299) · 794a4335

turneram authored Aug 31, 2022

Rewrite_gelu pass replaces the gelu formula of x * (1/2) * (1 + erf(x/sqrt(2))) with the sigmoid approximation of x * Sigmoid(x * 1.702)

794a4335

27 Aug, 2022 1 commit

Improvements to handling and add constant passed to dot operator (#1280) · 8752875a

Paul Fultz II authored Aug 26, 2022

This will rewrite dot operators like X(Y + b) to XY + Xb when b is constant as we can fold the add away.
This improves handling pointwise with broadcasted operators, this helps improves const propagation.
Improve gemm fusion with a mul_add
Improve support for broadcast shapes in gemm

8752875a

17 Aug, 2022 1 commit
- Add jit layernorm fusion (#1301) · 1784584e
  Paul Fultz II authored Aug 16, 2022
  
  1784584e
16 Aug, 2022 1 commit
- Fix softmax accuracy issues (#1342) · 0e17a724
  Paul Fultz II authored Aug 16, 2022
  
  0e17a724
25 Jul, 2022 1 commit

Add fpga target (#1304) · 8a30d698

varunsh authored Jul 25, 2022

* Add is_supported to the target
* Add get_target_assignments
* Rename assignment to target_assignments
* Add ref target header to test
* Add fpga target
* Make context const in compute

8a30d698

06 Jul, 2022 1 commit

Verify load and save (#1265) · f2531606

Paul Fultz II authored Jul 05, 2022

*In the verification tests, check that saving and reloading the program is the same program. This also fixes serialization to always load instructions in the same order. There is also fixes for deconv and quant_conv which didn't save the solution id, and was broken for serialization.

f2531606

22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
07 Jun, 2022 1 commit

Prioritizing int8 over int8x4 when it is applicable (#1218) · 37c47504

Zhuoran Yin authored Jun 07, 2022



prioritizing int8 over int8x4 when it is applicable
Amend return to continue in apply loop
Adding error handling in case int8x4 compilation failed
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

37c47504

02 Jun, 2022 1 commit
- Fix dangling reference with gemm add fusion (#1233) · 1339ba35
  Paul Fultz II authored Jun 01, 2022
  
  1339ba35
26 May, 2022 1 commit
- Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
  Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
  a401e72a