Commits · 5af79bd7e95e5951cadd54849eea6958685de4e5 · gaoqiong / MIGraphX

31 May, 2022 4 commits

Remove layernorm op · 5af79bd7
turneram authored May 31, 2022

5af79bd7
Merge remote-tracking branch 'origin/develop' into bert-attention-no-transpose-ops · 89068ad1
turneram authored May 31, 2022

89068ad1

Bump tensorflow from 2.6.4 to 2.7.2 in /examples/nlp/python_bert_squad (#1227) · 6e94e607

dependabot[bot] authored May 31, 2022

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.6.4 to 2.7.2.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.6.4...v2.7.2

)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

6e94e607

Merge remote-tracking branch 'origin/develop' into bert-attention-no-transpose-ops · be38aff9
turneram authored May 31, 2022

be38aff9

30 May, 2022 1 commit

Improve eliminate contiguous pass (#1223) · 86061b4d

shivadbhavsar authored May 29, 2022

Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.

86061b4d

27 May, 2022 1 commit
- renamed to main from master (#1226) · d436a723
  Chris Austen authored May 26, 2022
  
  d436a723
26 May, 2022 5 commits
- Parallelize evaluations in propagate_constant (#1220) · bf603a76
  shivadbhavsar authored May 26, 2022
```
Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call.

New approach:

Perform single pass though instructions in the module to determine which instructions can be evaluated
Evaluate selected instructions in parallel
Replace the selected instructions with the corresponding literal
```
  bf603a76
- Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
  Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
  a401e72a
- Formatting · 74b947ed
  turneram authored May 26, 2022
  
  74b947ed
- Use parse_layernorm to un-fuse LayerNormalization op · 6ca16d98
  turneram authored May 26, 2022
  
  6ca16d98
- Merge remote-tracking branch 'origin/develop' into bert-attention-no-transpose-ops · 503188d5
  turneram authored May 26, 2022
  
  503188d5
25 May, 2022 2 commits

Used wrong path to download the bertsquad-10.onnx model (#1221) · bd746ccf
Chris Austen authored May 25, 2022
```
raw is the download for the file, blob is the url for the github page.
```
bd746ccf

Bump tensorflow from 2.5.3 to 2.6.4 in /examples/nlp/python_bert_squad (#1219) · 4e18f991

dependabot[bot] authored May 25, 2022

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.3 to 2.6.4.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.3...v2.6.4

)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

4e18f991

24 May, 2022 4 commits

Improve applicable batched gemms (#1214) · bf0a4713
Paul Fultz II authored May 24, 2022
```
* Improve applicable batched gemms for bert
```
bf0a4713

Remove std references in runtime compilation (#1186) · 150d6d20

Paul Fultz II authored May 24, 2022

Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system

150d6d20

Fuse gemm add with pointwise fusions (#1213) · a500620e
Paul Fultz II authored May 24, 2022
```
* Fuse gemm add with pointwise fusions
```
a500620e

Fix onnx mean parsing for integral inputs (#1209) · d895104a

shivadbhavsar authored May 23, 2022

As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.

d895104a

20 May, 2022 17 commits
- Formatting · de433392
  turneram authored May 20, 2022
  
  de433392
- Add num_heads to attention node · f14e2a44
  turneram authored May 20, 2022
  
  f14e2a44
- Add contiguous before reshape · eff3d2d3
  turneram authored May 20, 2022
  
  eff3d2d3
- Formatting · cd96c1c8
  turneram authored May 20, 2022
  
  cd96c1c8
- Remove transpose kernels · 37351ed6
  turneram authored May 20, 2022
  
  37351ed6
- Formatting · 48187e79
  turneram authored May 20, 2022
  
  48187e79
- Add attention verify_onnx test · 6202ea15
  turneram authored May 20, 2022
  
  6202ea15
- Formatting · b745f416
  turneram authored May 20, 2022
  
  b745f416
- Fix layernorm verify test · a16adb42
  turneram authored May 20, 2022
  
  a16adb42
- Formatting · d41f0d66
  turneram authored May 20, 2022
  
  d41f0d66
- Add transposectx and transposeqkv ref tests · 095e49a3
  turneram authored May 20, 2022
  
  095e49a3
- Remove non-inference portions of parse_attention · 7757cfd0
  turneram authored May 20, 2022
  
  7757cfd0
- Generate layernorm onnx file · 5a62e9e7
  turneram authored May 20, 2022
  
  5a62e9e7
- Formatting · 05e8bfde
  turneram authored May 20, 2022
  
  05e8bfde
- Add attention, layernorm op, transposectx, and transposeqkv · 3ea9fe4c
  turneram authored May 20, 2022
  
  3ea9fe4c
- Rename pointwise ops (#1145) · 4a312201
  kahmed10 authored May 20, 2022
```
For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
```
  4a312201
- Improve matching with has_value when there are convert operators (#1212) · 27af0170
  Paul Fultz II authored May 19, 2022
  
  27af0170
17 May, 2022 1 commit
- renamed variables for module from p to m (#1204) · a27dd28c
  shivadbhavsar authored May 17, 2022
```
Updated variable names according to #1193
```
  a27dd28c
13 May, 2022 1 commit

Update install_prereqs.sh for individual use (#1197) · 8c94ad07

Chris Austen authored May 13, 2022

Our documentation indicates a user with sudo can run the install_prereqs.sh file. Turns out that the file is not complete enough to run on Ubuntu 18.04/20.04 independently. I updated the file to resolve the failures.

resolves #1191

8c94ad07

11 May, 2022 2 commits
- Prefuse layernorm for gpu (#1190) · 671f24be
  Paul Fultz II authored May 11, 2022
```
Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster
```
  671f24be
- Updated a path to the bert-squad onnx file after upstream changed path (#1201) · 4ec8209f
  Chris Austen authored May 10, 2022
```
ONNX Models changed from master to main. Changing path reflect the proper location
```
  4ec8209f
10 May, 2022 1 commit
- Expose `add_literal` in C and Python API (#1173) · 5e5ed37a
  Umang Yadav authored May 10, 2022
```
Expose add_literal method in C/C++ api
```
  5e5ed37a
09 May, 2022 1 commit

Refactor vectorization and preloading for pointwise fusions (#1184) · ddbbe54b

Paul Fultz II authored May 09, 2022

Improves performance for add_gelu.  In bert it is 4x faster and for mul_add it is 50% faster than what we current have.

ddbbe54b