Commits · ca8a54fe732e725f0e22ebc09187bd71faf131a5 · gaoqiong / MIGraphX

03 Jul, 2022 1 commit

Paul Fultz II authored Jul 03, 2022

* Add mlir c api

* Formatting

* Create a type attribute

* Formatting

* Parse module

* Formatting

* Add mlir dump function

* Add test case

* Formatting

* Fix tidy issues

* Update mlit version

* Update to newer mlir

* Format

* Move mlir to the gpu and update the test

* Formatting

* Fix bug when appending module

* Format

* Remove old cmake flag

* Update message

* Add return

* Format

* Add mlir_compile

* Format

* Register dialect

* Handle unsinged integers

* Dont provide output for return instruction

* Format

* Add code to insert memrefs

* Format

* Add mlir verification

* Formatting

* Enable pointwise_fusion

* Disable eliminate_data_type

* Set kernal name

* Format

* Fix device name

* Formatting

* Fix output arg

* Format

* Updates

* Upate hash

* Add fuse_mlir pass

* Format

* Add fuse mlir

* Format

* Update mlir

* Sort parameter names

* Format

* Reenable disabled passes

* Remove old mlir conv

* Remove asym default padding

* Add more verbose tracing

* Format

* Fix compilation errors

* Format

* Whitelist operators

* Format

* Add namespace

* Format

* Update triple

* Format

* Use func dialect

* Format

* Use func.return

* Format

* Upgrade mlir version

* Add comment

* Handle symetrical padding

* Format

* Cleanup debug output

* Format

* List failed tests

* Move mlir compile to jit pipeline

* Format

* Update version

* Add source locations

* Format

* Correctly add module

* Format

* Update failed tests

* Fix failures when mlir is disabled

* Format

* Update mlir version

* Check type for fp32

* Format

* Remove failed test

* Update mlir in driver

* Tidy fixes

* Foramt

* Tidy fixes

* Format

* Fix const

* Remove from requirements

* Fix cmake version

* Fix tidy warning

* Use another ifdef

* Fix tidy

* Other tidy fix

* Format

* Update hash

* Add missing license files

* Format

* Format

* Fix fnction name

ca8a54fe

30 Jun, 2022 1 commit

Add method to insert multiple instructions (#1178) · 2783c649

Paul Fultz II authored Jun 29, 2022

This is an extension to insert_module_instructions, but instead of just inserting from a module, it can insert a range or a vector of instructions.

2783c649

29 Jun, 2022 4 commits
- Invalid parameter for yolov4 example (#1275) · 9f74dded
  Chris Austen authored Jun 29, 2022
```
should be --fp16 , not --fp16ref
```
  9f74dded
- NMS refactor, enable nonstandard shape (#1257) · ad73abbc
  Charlie Lin authored Jun 29, 2022
```
Allows PyTorch converted version of SSD-resnet34 to work
```
  ad73abbc
- Update driver models to use json strings (#1244) · ad27d0d6
  Paul Fultz II authored Jun 29, 2022
```
 Compiles significantly faster than constructing all the objects. It also reduces recompiles as well.
```
  ad27d0d6
- Custom Op example using MIOpen calls (#1208) · 56440c4a
  Umang Yadav authored Jun 28, 2022
```
This PR only adds an example using MIOpen Calls.
```
  56440c4a
28 Jun, 2022 2 commits
- Custom Op example using rocBLAS calls (#1211) · e914254c
  Umang Yadav authored Jun 28, 2022
```
Add an example using rocBLAS Calls
```
  e914254c
- Custom Op example using HIP kernel (#1200) · cb18b0b5
  Umang Yadav authored Jun 28, 2022
```
This PR only adds an example using HIP kernel.
```
  cb18b0b5
26 Jun, 2022 1 commit
- Get parent module in the pass manager (#1181) · 3a5c4306
  Paul Fultz II authored Jun 26, 2022
```
* Add function to get a module tree
* Get parent module in the pass manager
```
  3a5c4306
25 Jun, 2022 2 commits
- bug fix: register the miopen_fusion op. (#1267) · 3b0a9116
  Brian Pickrell authored Jun 24, 2022
```
One-line fix to register the op miopen_fusion. This error was causing loading of compiled model files (*.mxr) to fail.
```
  3b0a9116
- Use jit for contiguous operator (#1217) · b75c83d8
  Paul Fultz II authored Jun 24, 2022
```
* Jit contiguous
```
  b75c83d8
24 Jun, 2022 2 commits

Adding in check_stamped.py to tools/ (#1255) · 8c35fa94

Ted Themistokleous authored Jun 24, 2022

Used to determine what files contain a license and are stamped. If not we exit and return an error code that can be later ingested by another script, as well as a list of the outstanding files in questions.

Currently baked in the list of files we should support or not support with licenses in them a well as some stuff to quickly ignore

8c35fa94

Add compute_method for the experimental custom op (#1194) · edc7be5c

Umang Yadav authored Jun 24, 2022

Adds compute_method for the experimental custom ops.
Adds a test for the same using HIP APIs.
Depends on #1183
Solves #1101

edc7be5c

23 Jun, 2022 2 commits
- remove eliminate_workspace pass (#1254) · f5760e21
  kahmed10 authored Jun 23, 2022
```
* remove eliminate workspace
* remove sync device and other tags
```
  f5760e21
- Fix code block issue with .ipynb files. (#1263) · e95b875f
  Ted Themistokleous authored Jun 22, 2022
```
Regenerate notebook header for licensing
```
  e95b875f
22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
20 Jun, 2022 1 commit
- Fixing misspelled macro to enable MIOpen hidden find mode API (#1250) · c0398ded
  Zhuoran Yin authored Jun 20, 2022
```
* Fixing misspelled macro
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
```
  c0398ded
17 Jun, 2022 3 commits

Update lowering of Dot operator (#1247) · c99be32c

Umang Yadav authored Jun 17, 2022



* remove code for allocation of C param in dot lowering

* formatting
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

c99be32c

Update tf_parser to have add_common_op() for parse_relu6 (#1241) · 421a5621

Ted Themistokleous authored Jun 17, 2022



* [#935] Update tf_parser to have add_common_op() for parse_relu6

Similar to that of the onnx_parser.cpp add a add_common_op template and functionality to support clip based operations. This is done so clip operations can be guarenteed to have the same dimensions.

* fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* Formatting

* fixup! Formatting
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

421a5621

Create allocate op and replace_allocate pass (#1183) · add6fb3b

kahmed10 authored Jun 17, 2022



* add allocate op header

* formatting

* add replace_allocate pass

* formatting

* move output param to remove_allocate pass

* formatting

* fix bugs in replace_allocate pass

* formatting

* fix verify if tests

* formatting

* move if op logic

* formatting

* cleanup lowering

* cleanup lowering

* formatting

* fix tidy

* formatting

* fix tidy

* add cpu allocate check

* formatting

* change cpu allocate in pass

* formatting

* add some tests for replace_allocate pass

* formatting

* pass by ref

* fix run_pass

* formatting

* update variable name for module

* update dce to use contains() and fix tidy

* formatting

* update cppcheck

* add if test

* formatting

* add if test

* rename var to mod_output_names

* formatting

* remove conditional

* update allocate op and tests

* formatting

* update replace_allocate tests

* update create_output_names() and conditional in replace_allocate

* formatting

* remove extra variable in replace_allocate

* update tools script for allocation_model
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

add6fb3b

16 Jun, 2022 2 commits

Instruction distance check fix (#1237) · f5980619

Charlie Lin authored Jun 16, 2022



* Use custom distance function

* Pass module, skip order check if other module

* Change other valid()

* Remove unnecessary declaration

* test multiple module dependency

* Refactor to make more clear

* Code cleanup

* Simplify fix

* Test EXPECT
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

f5980619

Use env var for creds · e16faac2
Paul authored Jun 16, 2022

e16faac2

10 Jun, 2022 1 commit

Add vectorized reduce (#1202) · aa7ff911

Paul Fultz II authored Jun 09, 2022



Consolidate the vectorize and preload
Add vectorization to reduction
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>

aa7ff911

07 Jun, 2022 1 commit

Prioritizing int8 over int8x4 when it is applicable (#1218) · 37c47504

Zhuoran Yin authored Jun 07, 2022



prioritizing int8 over int8x4 when it is applicable
Amend return to continue in apply loop
Adding error handling in case int8x4 compilation failed
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

37c47504

03 Jun, 2022 1 commit

Group code objects by kernel name in perf report summary (#1234) · 7271ddbc

Paul Fultz II authored Jun 02, 2022

Break up the gpu::code_object  print to show the actual kernels...

gpu::code_object::add_kernel: 0.646121ms, 5%
gpu::code_object::mul_kernel: 0.623822ms, 5%
gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
gpu::code_object::mul_add_kernel: 0.478352ms, 4%

7271ddbc

02 Jun, 2022 2 commits
- Fix compilation on Debian bookworm/sid (#1229) · bcac9858
  yves renier authored Jun 02, 2022
```
clang++ complained about not knowing of std::string for a file
Authored-by: Yves Renier <102358016+yves-renier@users.noreply.github.com>
```
  bcac9858
- Fix dangling reference with gemm add fusion (#1233) · 1339ba35
  Paul Fultz II authored Jun 01, 2022
  
  1339ba35
01 Jun, 2022 1 commit
- Update protobuf version (#1228) · 0cc6304d
  kahmed10 authored Jun 01, 2022
```
update protobuf version
```
  0cc6304d
31 May, 2022 1 commit

Bump tensorflow from 2.6.4 to 2.7.2 in /examples/nlp/python_bert_squad (#1227) · 6e94e607

dependabot[bot] authored May 31, 2022

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.6.4 to 2.7.2.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.6.4...v2.7.2

)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

6e94e607

30 May, 2022 1 commit

Improve eliminate contiguous pass (#1223) · 86061b4d

shivadbhavsar authored May 29, 2022

Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.

86061b4d

27 May, 2022 1 commit
- renamed to main from master (#1226) · d436a723
  Chris Austen authored May 26, 2022
  
  d436a723
26 May, 2022 2 commits

Parallelize evaluations in propagate_constant (#1220) · bf603a76

shivadbhavsar authored May 26, 2022

Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call.

New approach:

Perform single pass though instructions in the module to determine which instructions can be evaluated
Evaluate selected instructions in parallel
Replace the selected instructions with the corresponding literal

bf603a76

Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
a401e72a

25 May, 2022 2 commits

Used wrong path to download the bertsquad-10.onnx model (#1221) · bd746ccf
Chris Austen authored May 25, 2022
```
raw is the download for the file, blob is the url for the github page.
```
bd746ccf

Bump tensorflow from 2.5.3 to 2.6.4 in /examples/nlp/python_bert_squad (#1219) · 4e18f991

dependabot[bot] authored May 25, 2022

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.3 to 2.6.4.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.3...v2.6.4

)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

4e18f991

24 May, 2022 4 commits

Improve applicable batched gemms (#1214) · bf0a4713
Paul Fultz II authored May 24, 2022
```
* Improve applicable batched gemms for bert
```
bf0a4713

Remove std references in runtime compilation (#1186) · 150d6d20

Paul Fultz II authored May 24, 2022

Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system

150d6d20

Fuse gemm add with pointwise fusions (#1213) · a500620e
Paul Fultz II authored May 24, 2022
```
* Fuse gemm add with pointwise fusions
```
a500620e

Fix onnx mean parsing for integral inputs (#1209) · d895104a

shivadbhavsar authored May 23, 2022

As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.

d895104a

20 May, 2022 1 commit

Rename pointwise ops (#1145) · 4a312201

kahmed10 authored May 20, 2022

For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.

4a312201