Commits · 1e4e9c577593d951c03c5eabfdc71c5b7809e4b4 · gaoqiong / MIGraphX

14 Jun, 2022 5 commits
- Format · 1e4e9c57
  Paul authored Jun 14, 2022
  
  1e4e9c57
- Adjust transpose slice · fa197568
  Paul authored Jun 14, 2022
  
  fa197568
- Merge branch 'unsqueeze-step' into fuse-horiz-contiguous2 · c96b88b7
  Paul authored Jun 14, 2022
  
  c96b88b7
- Format · c2c7f497
  Paul authored Jun 13, 2022
  
  c2c7f497
- Add step to unsqeeze · 3101f6fe
  Paul authored Jun 13, 2022
  
  3101f6fe
13 Jun, 2022 1 commit
- Use std::transform · 2a8e8d07
  Paul authored Jun 13, 2022
  
  2a8e8d07
10 Jun, 2022 1 commit

Add vectorized reduce (#1202) · aa7ff911

Paul Fultz II authored Jun 09, 2022



Consolidate the vectorize and preload
Add vectorization to reduction
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>

aa7ff911

08 Jun, 2022 2 commits
- Rename class · 9840f756
  Paul authored Jun 08, 2022
  
  9840f756
- Fix copy-paste · 4ddaf83e
  Paul authored Jun 08, 2022
  
  4ddaf83e
07 Jun, 2022 1 commit

Prioritizing int8 over int8x4 when it is applicable (#1218) · 37c47504

Zhuoran Yin authored Jun 07, 2022



prioritizing int8 over int8x4 when it is applicable
Amend return to continue in apply loop
Adding error handling in case int8x4 compilation failed
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

37c47504

03 Jun, 2022 1 commit

Group code objects by kernel name in perf report summary (#1234) · 7271ddbc

Paul Fultz II authored Jun 02, 2022

Break up the gpu::code_object  print to show the actual kernels...

gpu::code_object::add_kernel: 0.646121ms, 5%
gpu::code_object::mul_kernel: 0.623822ms, 5%
gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
gpu::code_object::mul_add_kernel: 0.478352ms, 4%

7271ddbc

02 Jun, 2022 2 commits
- Fix compilation on Debian bookworm/sid (#1229) · bcac9858
  yves renier authored Jun 02, 2022
```
clang++ complained about not knowing of std::string for a file
Authored-by: Yves Renier <102358016+yves-renier@users.noreply.github.com>
```
  bcac9858
- Fix dangling reference with gemm add fusion (#1233) · 1339ba35
  Paul Fultz II authored Jun 01, 2022
  
  1339ba35
01 Jun, 2022 4 commits
- Update protobuf version (#1228) · 0cc6304d
  kahmed10 authored Jun 01, 2022
```
update protobuf version
```
  0cc6304d
- Rename var · 8e832756
  Paul authored May 31, 2022
  
  8e832756
- Format · a7a4cd04
  Paul authored May 31, 2022
  
  a7a4cd04
- Add eliminate contiguous test · d3236e31
  Paul authored May 31, 2022
  
  d3236e31
31 May, 2022 3 commits

Fix warning · 8ea65187
Paul authored May 31, 2022

8ea65187
Merge · 5a1af3d1
Paul authored May 31, 2022

5a1af3d1

Bump tensorflow from 2.6.4 to 2.7.2 in /examples/nlp/python_bert_squad (#1227) · 6e94e607

dependabot[bot] authored May 31, 2022

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.6.4 to 2.7.2.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.6.4...v2.7.2

)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

6e94e607

30 May, 2022 1 commit

Improve eliminate contiguous pass (#1223) · 86061b4d

shivadbhavsar authored May 29, 2022

Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.

86061b4d

27 May, 2022 1 commit
- renamed to main from master (#1226) · d436a723
  Chris Austen authored May 26, 2022
  
  d436a723
26 May, 2022 2 commits

Parallelize evaluations in propagate_constant (#1220) · bf603a76

shivadbhavsar authored May 26, 2022

Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call.

New approach:

Perform single pass though instructions in the module to determine which instructions can be evaluated
Evaluate selected instructions in parallel
Replace the selected instructions with the corresponding literal

bf603a76

Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
a401e72a

25 May, 2022 2 commits

Used wrong path to download the bertsquad-10.onnx model (#1221) · bd746ccf
Chris Austen authored May 25, 2022
```
raw is the download for the file, blob is the url for the github page.
```
bd746ccf

Bump tensorflow from 2.5.3 to 2.6.4 in /examples/nlp/python_bert_squad (#1219) · 4e18f991

dependabot[bot] authored May 25, 2022

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.3 to 2.6.4.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.3...v2.6.4

)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

4e18f991

24 May, 2022 4 commits

Improve applicable batched gemms (#1214) · bf0a4713
Paul Fultz II authored May 24, 2022
```
* Improve applicable batched gemms for bert
```
bf0a4713

Remove std references in runtime compilation (#1186) · 150d6d20

Paul Fultz II authored May 24, 2022

Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system

150d6d20

Fuse gemm add with pointwise fusions (#1213) · a500620e
Paul Fultz II authored May 24, 2022
```
* Fuse gemm add with pointwise fusions
```
a500620e

Fix onnx mean parsing for integral inputs (#1209) · d895104a

shivadbhavsar authored May 23, 2022

As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.

d895104a

20 May, 2022 4 commits
- Rename pointwise ops (#1145) · 4a312201
  kahmed10 authored May 20, 2022
```
For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
```
  4a312201
- Improve matching with has_value when there are convert operators (#1212) · 27af0170
  Paul Fultz II authored May 19, 2022
  
  27af0170
- Format · dfc7bbac
  Paul authored May 19, 2022
  
  dfc7bbac
- Fix contiguous after splits · 3fb1b7f3
  Paul authored May 19, 2022
  
  3fb1b7f3
19 May, 2022 2 commits
- Format · 14ac1aed
  Paul authored May 19, 2022
  
  14ac1aed
- Add tests · 8d87d83a
  Paul authored May 19, 2022
  
  8d87d83a
17 May, 2022 3 commits
- Format · 39bbf87c
  Paul authored May 17, 2022
  
  39bbf87c
- Horizontally fuse contiguous · dcd3d04b
  Paul authored May 17, 2022
  
  dcd3d04b
- renamed variables for module from p to m (#1204) · a27dd28c
  shivadbhavsar authored May 17, 2022
```
Updated variable names according to #1193
```
  a27dd28c
13 May, 2022 1 commit

Update install_prereqs.sh for individual use (#1197) · 8c94ad07

Chris Austen authored May 13, 2022

Our documentation indicates a user with sudo can run the install_prereqs.sh file. Turns out that the file is not complete enough to run on Ubuntu 18.04/20.04 independently. I updated the file to resolve the failures.

resolves #1191

8c94ad07