Commits · ecb1545c7f73a033f1e2dba6fed337d1f76fe6cb · gaoqiong / MIGraphX

16 Feb, 2022 1 commit
- Add assign_to method for C++ API (#1075) · ecb1545c
  kahmed10 authored Feb 16, 2022
  
  ecb1545c
11 Feb, 2022 2 commits
- Fix hang with CSE pass when using submodules (#1050) · 48585bad
  kahmed10 authored Feb 11, 2022
```
* add submodule test
* remove for loop
* simplify reshape test
```
  48585bad
- Update missing copyright information for run_onnx_squad.py (#1069) · 81869844
  Chris Austen authored Feb 11, 2022
```
Update copyright for a python file that was modified when added to our code 
```
  81869844
09 Feb, 2022 2 commits
- Enable pointwise fusion by default (#1082) · c7419a9c
  Paul Fultz II authored Feb 09, 2022
```
There is now a MIGRAPHX_DISABLE_POINTWISE_FUSION to disable it
```
  c7419a9c
- Support nonstandard shapes for the Squeeze Op (#1068) · e64b773f
  Umang Yadav authored Feb 09, 2022
```
Support slice, broadcast and transpose shapes for the squeeze op.
```
  e64b773f
08 Feb, 2022 3 commits

File ext rename (#1078) · a30ec101
Charlie Lin authored Feb 08, 2022
```
Changed MessagePack file extensions to mxr.
```
a30ec101

Add missing output_alias to miopen_fusion op (#1076) · b304d97d

Paul Fultz II authored Feb 08, 2022

This causes incorrect memory coloring, which was causing the accuracy failures in the vision model when enabling the pointwise fusions. Resnet50, inceptionv3 and inceptionv4 do verify now in the driver.

b304d97d

Enforce types to avoid compilation error in pointwise fusions (#1077) · 73b8a773
Paul Fultz II authored Feb 08, 2022
```
Enforce types to avoid compilation error in pointwise fusions
This fixes compile failure: gpt-2, fp16 on Navi
```
73b8a773

02 Feb, 2022 1 commit

Update trace_eval to preview the output buffers (#1073) · b20e3d4d

Paul Fultz II authored Feb 02, 2022

Currently, MIGRAPHX_TRACE_EVAL=2 prints out the entire output buffer, but this can produce a lot of output. To make it easier to inspect and debug, using MIGRAPHX_TRACE_EVAL=2 now only prints 10 elements from the buffer(the first 5 and last 5) and shows any fp classifications found in the buffer(ie nans, infinity, etc). The previous behavior can still be enabled with MIGRAPHX_TRACE_EVAL=3.

b20e3d4d

01 Feb, 2022 1 commit
- Add python type annotations to api.py (#1061) · 2a79a9ff
  Paul Fultz II authored Feb 01, 2022
```
This will also check the types using mypy on the CI.
```
  2a79a9ff
31 Jan, 2022 1 commit
- Parse upsample (#1060) · 7e7ef0b8
  Shucai Xiao authored Jan 31, 2022
```
* use the parse_resize to parse the upsample operator
```
  7e7ef0b8
28 Jan, 2022 3 commits
- Add cppcheck to examples (#1070) · d0543c96
  Paul Fultz II authored Jan 28, 2022
```
Add cppcheck to examples
```
  d0543c96
- Add auto-vectorization of pointwise operators (#1047) · 78a3c9b7
  Paul Fultz II authored Jan 28, 2022
```
* Enable auto vectorization
* Handle vector types with convert function
* Dont vectorize when it will cause problems with preload
```
  78a3c9b7
- Add Mean op ONNX parser (#1065) · b7218806
  turneram authored Jan 28, 2022
```
* Add mean op onnx parser and unit tests
* Refactor parse_mean to use add_broadcastable_binary_op
```
  b7218806
27 Jan, 2022 1 commit
- Remove Standard Shape requirement for ArgOps (#1042) · 332cb710
  Umang Yadav authored Jan 27, 2022
```
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
```
  332cb710
26 Jan, 2022 1 commit

Add HardSwish op ONNX parser (#1066) · 7477aeb8

turneram authored Jan 26, 2022

Add HardSwish to HardSigmoid parser

HardSwish formula is y = x * HardSigmoid<alpha=1/6, beta=0.5>(x)
HardSigmoid parser sets alpha to 1/6 and adds the mul instruction if op name is HardSwish

Resolves #1062

7477aeb8

21 Jan, 2022 4 commits
- GreaterOrEqual ONNX parser (#1044) · 60aa1c85
  turneram authored Jan 21, 2022
```
Add onnx parser for operator GreaterOrEqual
```
  60aa1c85
- SoftSign ONNX parser (#1046) · ebb15dd3
  turneram authored Jan 21, 2022
```
Add onnx parser and unit tests for Softsign
```
  ebb15dd3
- SoftPlus ONNX parser (#1045) · 4c90e9a3
  turneram authored Jan 20, 2022
```
* Add onnx parser and unit test
```
  4c90e9a3
- Improve handling of generator expressions when getting the flags for hip (#1055) · 3f392a3b
  Paul Fultz II authored Jan 20, 2022
```
* Improve handling of generator expressions when getting the flags for hip
```
  3f392a3b
20 Jan, 2022 2 commits
- Add env variable to dump tests to a file (#1041) · 51b4439f
  Paul Fultz II authored Jan 20, 2022
  
  51b4439f
- Update satackey/action-docker-layer-caching to v 0.0.11 (#1057) · 0fbbba26
  Chris Austen authored Jan 20, 2022
```
There have been hangs in the CI runs recently.  Github runner jobs
are failing due to exceeding file system size.  Upgrading to 0.0.11
resolves this issue.
```
  0fbbba26
17 Jan, 2022 1 commit
- Make clip a pointwise op (#1043) · b0ece214
  Paul Fultz II authored Jan 17, 2022
```
Make clip a pointwise op
```
  b0ece214
11 Jan, 2022 1 commit

HardSigmoid ONNX parser (#1040) · fc42d852

turneram authored Jan 11, 2022

Add HardSigmoid onnx parser and unit tests
Produces mathematical equivalent to ONNX operator through combination of existing pointwise ops.
Resolves #1028

fc42d852

10 Jan, 2022 1 commit
- Handle miopen fusions when using pointwise fusions (#1019) · 534a05c1
  Paul Fultz II authored Jan 10, 2022
```
* Add matcher for conv_bias pointwise
* Add fusion op
```
  534a05c1
05 Jan, 2022 1 commit
- Fix time seed bug in random sequence ops (#1027) · 594f2802
  turneram authored Jan 05, 2022
```
Fix bug caused by casting time seed to float
```
  594f2802
10 Dec, 2021 1 commit

Updates to examples (#1022) · 46b0c33b

Cagri authored Dec 10, 2021

nfnet update
3dunet requirements via pip
3dunet requirement and nb-clean

46b0c33b

09 Dec, 2021 2 commits

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

Fuse last instruction in fuse_pointwise (#1015) · e758d457
Paul Fultz II authored Dec 09, 2021
```
Fuse last instruction in fuse_pointwise
This is also fixes a bug with using an invalid iterator.
```
e758d457

08 Dec, 2021 1 commit
- Fuse convert ops (#1020) · 00bfed4d
  Paul Fultz II authored Dec 08, 2021
  
  00bfed4d
07 Dec, 2021 2 commits

Rename reduce_inputs to virtual_inputs (#1021) · 1793cc54
Paul Fultz II authored Dec 07, 2021
```
simple variable rename
```
1793cc54

Test runner match input output using tensor names (#996) · 0f9b4072

Shucai Xiao authored Dec 07, 2021

1. Previous implementation assumes inputs and outputs .pb files are ordered, but it is not the case. So, we should use the name of the tensors in the input/output .pb files to match the input and output in the onnx model. (This change applies to the BERT_Squad model)
2. When parsing a model with dynamic input shape, current implementation uses the default batch_size for the unknown dims, which can cause parsing error for some cases (e.g. mask_rcnn model). The solution is we first read an input to get the shape, then use these shapes to parse the onnx model.

0f9b4072

05 Dec, 2021 1 commit
- Change in documentation for roctx knob (#984) · 26d90328
  Cagri authored Dec 05, 2021
```
Adds description for roctx knob of migraphx-driver in documentation.
```
  26d90328
02 Dec, 2021 1 commit
- Fix pointwise compile error with half sqrt (#1010) · 7b3e58a0
  Paul Fultz II authored Dec 02, 2021
```
Fix pointwise compile error with half sqrt 
```
  7b3e58a0
30 Nov, 2021 2 commits
- Fix fusable_conv whitespace bug (#1008) · 9270ebaf
  turneram authored Nov 30, 2021
```
Fix whitespace bug in fusable_conv matcher and add unit test
```
  9270ebaf
- Fix vectorization of broadcasted inputs in pointwise fusions (#1011) · 5dfafd00
  Paul Fultz II authored Nov 30, 2021
  
  5dfafd00
25 Nov, 2021 2 commits

Non std shape auto contiguous (#1001) · 2d4dcc47

Shucai Xiao authored Nov 25, 2021

Resolves a problem in parsing the ssd-10 model.

The problem is, after inserting contiguous in the auto_contiguous pass, standard output shape of some operators becomes non-standard. Then, if the next operator requires standard input shape, an exception is throw.

For example, if we pass the following model:
Input (standard shape) -> transpose (transposed) -> softmax (transposed) -> transpose (standard) -> gather.
It works fine, and no contiguous is required.

In the auto_contiguous pass, a contiguous is inserted after the first transpose. Then we need to replace the first transpose with the contiguous and recompute all shapes. When it comes to the gather operator, its input is a transposed shape, and an exception is thrown.

The solution is in the recompute_shape() function. If it is called by the auto_contiguous pass and shape of an instruction is changed, and the shape is non_standard, we do not recompute shape of its output. The reason is: since its output shape is non_standard, a contiguous op will be added after the instruction, which will recompute shape for later operators.

2d4dcc47

Bump tensorflow from 2.5.1 to 2.5.2 in /examples/nlp/python_bert_squad (#1004) · 2788f647

dependabot[bot] authored Nov 25, 2021

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.1 to 2.5.2.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.1...v2.5.2

)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

2788f647

24 Nov, 2021 1 commit
- Check jit kernels files with clang-tidy (#1012) · a33d6fa2
  Paul Fultz II authored Nov 24, 2021
```
* Check jit kernels files with clang-tidy
```
  a33d6fa2
22 Nov, 2021 1 commit

Helper script for rocTX run and parse (#985) · 4f9a0ce7

Cagri authored Nov 22, 2021

This provides a helper script to run rocTX markers with migraphx-driver and reduces the number of steps a user would go through running rocTX knob.
Run:
python roctx.py --run --onnx_file <ONNX_FILE> --migraphx_args "--onnx --gpu --fp16 --batch 16" --out outputfolder
Runs and parses the run output (JSON file). An example output is given below:

SUM MIN MAX
Marker start: gpu::convolution 5272 10 563
Marker start: gpu::add_relu 605 12 18
Marker start: gpu::gather 299 145 154
Marker start: gpu::mul_add 227 14 57
Marker start: gpu::sub 177 13 42
Marker start: gpu::concat 169 22 31
Marker start: gpu::triadd_relu 163 15 18
Marker start: load 141 0 3
Marker start: hip::hip_copy_literal 111 0 3
Marker start: gpu::add 58 13 17
Marker start: broadcast 52 0 3
Marker start: gpu::convert 31 15 16
Marker start: slice 11 0 1
Marker start: gpu::pooling 9 9 9
Marker start: step 2 2 2
Marker start: @param 2 0 1
Marker start: reshape 1 0 1
Marker start: hip::hip_allocate_memory 1 1 1
Marker start: check_context::migraphx::version_... 0 ERR ERR

TOTAL TIME: 7331 us

JSON FILE PATH: [...]/rpl_data_211019_195229_9369/input_results_211019_195229/trace.json
Parse:
python roctx.py --parse --json_path <JSON PATH FROM RUN>
Note: The parse knob is made available if the user wants to parse an already existing JSON output.

4f9a0ce7