Commits · 938792b0c4d263dc963482f3a97aa06a2d6515a8 · gaoqiong / MIGraphX

"docs/archive_en_US/Tuner/NetworkmorphismTuner.md" did not exist on "a71cbe85dca1445f4c62991d9ac7cc4b21c4673e"

18 Jul, 2022 1 commit
- Work in progress. Adding in divzero instruction related things · f919cb7e
  Ted Themistokleous authored Jul 18, 2022
```
Conversion works, just issues with predicate right now.
```
  f919cb7e
11 Jul, 2022 2 commits

First attempt at adding check for div zero. · 558ca0fe

Ted Themistokleous authored Jul 08, 2022

I may need additional checks for this, or to somehow find the matching division
by zero, and cause a dangling reference so this gets flagged correctly at compile time.
Current attempt inserts a divzero instruction that would later get picked up at the verify stage
during compile. Not sure if this is correct incase we run into operator collisions down the road

558ca0fe

Revert "Remove check for div zero, push this to be detected in a seperate pass" · c3edd50e
Ted Themistokleous authored Jul 08, 2022
```
This reverts commit fcc84214.
```
c3edd50e

08 Jul, 2022 2 commits
- Remove check for div zero, push this to be detected in a seperate pass · fcc84214
  Ted Themistokleous authored Jul 08, 2022
  
  fcc84214
- Add in additional vector tests for simplify algebra changes · 728fe848
  Ted Themistokleous authored Jul 08, 2022
  
  728fe848
05 Jul, 2022 1 commit

Split existing unit tests that had multiple checks · a25058de

Ted Themistokleous authored Jul 05, 2022

Making things more maintainable by splitting the unit tests that shared the baseline program for validation which resulted in an & case that was a bit more cumbersome to debug.

a25058de

30 Jun, 2022 12 commits

fixup! Simplify algebra for 0*x, x*0 and 0/x operations · d6764b7d
Ted Themistokleous authored Jun 30, 2022

d6764b7d
fixup! Add support for division by zero errors · 41e3b9f6
Ted Themistokleous authored Jun 30, 2022

41e3b9f6

Add support for division by zero errors · c2d4ceb0

Ted Themistokleous authored Jun 30, 2022

Throw an exception when this occurs to indicate our simpliciation passes resulted in a singularity somewhere. Related to #1236

c2d4ceb0

Simplify algebra for 0*x, x*0 and 0/x operations · 20d2b5d9

Ted Themistokleous authored Jun 30, 2022

Simplify addition zero multiplication and divide operations. Added approrpiate test cases with returns and replacing the instruction and operand to just return zero.

20d2b5d9

Simplify x - 0 and 0 - x operations · 93f733c3

Ted Themistokleous authored Jun 29, 2022

Using the unit/neg unit matchers to handle subtraction operations in the same steps. Added unit tests for both cases.

93f733c3

Simplify algebra of negative division operations · 013e18cf
Ted Themistokleous authored Jun 28, 2022
```
Part of changes that go wtih #1236. Reverts -1 divide operations to a simple negation of the parameter
```
013e18cf

Simplify Algebra of 0+x & x+0 = x · 8ddb505a

Ted Themistokleous authored Jun 29, 2022

Added test case and code to simplify zero additions between paremeters and literals during simplifications.  In reference to issue #1236

8ddb505a

Simplify algebra for x / 1 operations · 37fe2f04

Ted Themistokleous authored Jun 24, 2022

Done to satisfy simplifications specified by #1236 .  Just replace every  parameter divided by 1 with itself. It's assumed that the eliminate_identity() pass will handle generated identity operators in our run_pass()

37fe2f04

Simplifiy Algebra for x*(-1) operations to simply negative x · 61ef5263
Ted Themistokleous authored Jun 27, 2022
```
Save a multiply operation with that of a negation of  input parameter x. Suggested improvement via #1236
```
61ef5263

Fix testcase for mul_add that was being prematurely optmized and failing · bb6c647d

Ted Themistokleous authored Jun 24, 2022

Original use case of having a literal 1, instead of any other number in simplify_mul_add, resulted in the find_unit_mult_const function to optimize away the literal 1 causing this test to fail. on the final check. Switched the constant to a non zero & one value, and now correctly passes.

bb6c647d

Initial algebraic simplification for x*1 · 49406cee
Ted Themistokleous authored Jun 23, 2022
```
Commit for the day, work in progress as I'm failing one of our unit tests outside of the change
```
49406cee

Add method to insert multiple instructions (#1178) · 2783c649

Paul Fultz II authored Jun 29, 2022

This is an extension to insert_module_instructions, but instead of just inserting from a module, it can insert a range or a vector of instructions.

2783c649

29 Jun, 2022 1 commit
- NMS refactor, enable nonstandard shape (#1257) · ad73abbc
  Charlie Lin authored Jun 29, 2022
```
Allows PyTorch converted version of SSD-resnet34 to work
```
  ad73abbc
25 Jun, 2022 1 commit
- Use jit for contiguous operator (#1217) · b75c83d8
  Paul Fultz II authored Jun 24, 2022
```
* Jit contiguous
```
  b75c83d8
24 Jun, 2022 2 commits

Adding in check_stamped.py to tools/ (#1255) · 8c35fa94

Ted Themistokleous authored Jun 24, 2022

Used to determine what files contain a license and are stamped. If not we exit and return an error code that can be later ingested by another script, as well as a list of the outstanding files in questions.

Currently baked in the list of files we should support or not support with licenses in them a well as some stuff to quickly ignore

8c35fa94

Add compute_method for the experimental custom op (#1194) · edc7be5c

Umang Yadav authored Jun 24, 2022

Adds compute_method for the experimental custom ops.
Adds a test for the same using HIP APIs.
Depends on #1183
Solves #1101

edc7be5c

22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
17 Jun, 2022 2 commits

Update tf_parser to have add_common_op() for parse_relu6 (#1241) · 421a5621

Ted Themistokleous authored Jun 17, 2022



* [#935] Update tf_parser to have add_common_op() for parse_relu6

Similar to that of the onnx_parser.cpp add a add_common_op template and functionality to support clip based operations. This is done so clip operations can be guarenteed to have the same dimensions.

* fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* Formatting

* fixup! Formatting
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

421a5621

Create allocate op and replace_allocate pass (#1183) · add6fb3b

kahmed10 authored Jun 17, 2022



* add allocate op header

* formatting

* add replace_allocate pass

* formatting

* move output param to remove_allocate pass

* formatting

* fix bugs in replace_allocate pass

* formatting

* fix verify if tests

* formatting

* move if op logic

* formatting

* cleanup lowering

* cleanup lowering

* formatting

* fix tidy

* formatting

* fix tidy

* add cpu allocate check

* formatting

* change cpu allocate in pass

* formatting

* add some tests for replace_allocate pass

* formatting

* pass by ref

* fix run_pass

* formatting

* update variable name for module

* update dce to use contains() and fix tidy

* formatting

* update cppcheck

* add if test

* formatting

* add if test

* rename var to mod_output_names

* formatting

* remove conditional

* update allocate op and tests

* formatting

* update replace_allocate tests

* update create_output_names() and conditional in replace_allocate

* formatting

* remove extra variable in replace_allocate

* update tools script for allocation_model
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

add6fb3b

16 Jun, 2022 1 commit

Instruction distance check fix (#1237) · f5980619

Charlie Lin authored Jun 16, 2022



* Use custom distance function

* Pass module, skip order check if other module

* Change other valid()

* Remove unnecessary declaration

* test multiple module dependency

* Refactor to make more clear

* Code cleanup

* Simplify fix

* Test EXPECT
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

f5980619

07 Jun, 2022 1 commit

Prioritizing int8 over int8x4 when it is applicable (#1218) · 37c47504

Zhuoran Yin authored Jun 07, 2022



prioritizing int8 over int8x4 when it is applicable
Amend return to continue in apply loop
Adding error handling in case int8x4 compilation failed
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

37c47504

02 Jun, 2022 1 commit
- Fix dangling reference with gemm add fusion (#1233) · 1339ba35
  Paul Fultz II authored Jun 01, 2022
  
  1339ba35
26 May, 2022 1 commit
- Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
  Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
  a401e72a
24 May, 2022 2 commits

Improve applicable batched gemms (#1214) · bf0a4713
Paul Fultz II authored May 24, 2022
```
* Improve applicable batched gemms for bert
```
bf0a4713

Fix onnx mean parsing for integral inputs (#1209) · d895104a

shivadbhavsar authored May 23, 2022

As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.

d895104a

11 May, 2022 1 commit

Prefuse layernorm for gpu (#1190) · 671f24be

Paul Fultz II authored May 11, 2022

Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster

671f24be

10 May, 2022 1 commit
- Expose `add_literal` in C and Python API (#1173) · 5e5ed37a
  Umang Yadav authored May 10, 2022
```
Expose add_literal method in C/C++ api
```
  5e5ed37a
06 May, 2022 1 commit
- Add compile tests for gpu math functions (#1182) · 6a5cda96
  Paul Fultz II authored May 06, 2022
```
Add compile tests for gpu math functions
```
  6a5cda96
03 May, 2022 1 commit

Extend lifetimes in C++ API (#1139) · 4a5a23a4

Paul Fultz II authored May 02, 2022

Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.

4a5a23a4

29 Apr, 2022 1 commit
- Add GatherND operator (#1089) · 4ec35e5f
  turneram authored Apr 28, 2022
```
Add ref and gpu implementations for ONNX op GatherND

Resolves #1032
```
  4ec35e5f
26 Apr, 2022 1 commit
- Expose get_queue method for context in API (#1161) · 36656030
  Umang Yadav authored Apr 26, 2022
```
* expose get_queue method
```
  36656030
23 Apr, 2022 1 commit

ReverseSequence op (#1177) · 31906785

Charlie Lin authored Apr 22, 2022

Implements the ReverseSequence ONNX operator as a parser.

This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell.
We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator.
The ONNX backend tests are disabled because this does not handle variable sequence_lens.

31906785

19 Apr, 2022 1 commit

Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152) · 764273e4

Charlie Lin authored Apr 18, 2022

Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp.
Removed cpu_pooling, instead using reference pooling in pooling.hpp
Added reference implementation of Lp Norm pooling and the global version
Added tests for the Lp Norm Pooling

764273e4

17 Apr, 2022 1 commit

Reduce with runtime compilation (#1150) · f9a5b81e

Paul Fultz II authored Apr 17, 2022

There is significant improvement on larger tensors with half almost 50% faster:

lens: [1024, 384, 768]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
gpu::reduce_sum[axes={2}]: 1.73126ms
Also for non-trivial layouts this can sometimes be over 2x faster:

lens: [64, 1024, 768, 4]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
gpu::reduce_sum[axes={1}]: 2.63375ms
Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.

Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.

f9a5b81e