Commits · 5496a55d0802c23dfd5a46e2ec5a958ca74b4e8e · gaoqiong / MIGraphX

03 Feb, 2022 3 commits
- clang format · 5496a55d
  Shucai Xiao authored Feb 02, 2022
  
  5496a55d
- refine print out information · fad15f19
  Shucai Xiao authored Feb 02, 2022
  
  fad15f19
- Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into print_matmul_perf_flops · 5c22dd0f
  Shucai Xiao authored Feb 02, 2022
  
  5c22dd0f
02 Feb, 2022 1 commit

Update trace_eval to preview the output buffers (#1073) · b20e3d4d

Paul Fultz II authored Feb 02, 2022

Currently, MIGRAPHX_TRACE_EVAL=2 prints out the entire output buffer, but this can produce a lot of output. To make it easier to inspect and debug, using MIGRAPHX_TRACE_EVAL=2 now only prints 10 elements from the buffer(the first 5 and last 5) and shows any fp classifications found in the buffer(ie nans, infinity, etc). The previous behavior can still be enabled with MIGRAPHX_TRACE_EVAL=3.

b20e3d4d

01 Feb, 2022 2 commits
- Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into print_matmul_perf_flops · 4da0e5e6
  Shucai Xiao authored Feb 01, 2022
  
  4da0e5e6
- Add python type annotations to api.py (#1061) · 2a79a9ff
  Paul Fultz II authored Feb 01, 2022
```
This will also check the types using mypy on the CI.
```
  2a79a9ff
31 Jan, 2022 4 commits
- Parse upsample (#1060) · 7e7ef0b8
  Shucai Xiao authored Jan 31, 2022
```
* use the parse_resize to parse the upsample operator
```
  7e7ef0b8
- additional refinement for gemm flops · c8dc5f7b
  Shucai Xiao authored Jan 31, 2022
  
  c8dc5f7b
- clang format · 779e6525
  Shucai Xiao authored Jan 30, 2022
  
  779e6525
- refine perf report and add flops for gemm op · 51452c03
  Shucai Xiao authored Jan 30, 2022
  
  51452c03
28 Jan, 2022 3 commits
- Add cppcheck to examples (#1070) · d0543c96
  Paul Fultz II authored Jan 28, 2022
```
Add cppcheck to examples
```
  d0543c96
- Add auto-vectorization of pointwise operators (#1047) · 78a3c9b7
  Paul Fultz II authored Jan 28, 2022
```
* Enable auto vectorization
* Handle vector types with convert function
* Dont vectorize when it will cause problems with preload
```
  78a3c9b7
- Add Mean op ONNX parser (#1065) · b7218806
  turneram authored Jan 28, 2022
```
* Add mean op onnx parser and unit tests
* Refactor parse_mean to use add_broadcastable_binary_op
```
  b7218806
27 Jan, 2022 1 commit
- Remove Standard Shape requirement for ArgOps (#1042) · 332cb710
  Umang Yadav authored Jan 27, 2022
```
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
```
  332cb710
26 Jan, 2022 1 commit

Add HardSwish op ONNX parser (#1066) · 7477aeb8

turneram authored Jan 26, 2022

Add HardSwish to HardSigmoid parser

HardSwish formula is y = x * HardSigmoid<alpha=1/6, beta=0.5>(x)
HardSigmoid parser sets alpha to 1/6 and adds the mul instruction if op name is HardSwish

Resolves #1062

7477aeb8

21 Jan, 2022 4 commits
- GreaterOrEqual ONNX parser (#1044) · 60aa1c85
  turneram authored Jan 21, 2022
```
Add onnx parser for operator GreaterOrEqual
```
  60aa1c85
- SoftSign ONNX parser (#1046) · ebb15dd3
  turneram authored Jan 21, 2022
```
Add onnx parser and unit tests for Softsign
```
  ebb15dd3
- SoftPlus ONNX parser (#1045) · 4c90e9a3
  turneram authored Jan 20, 2022
```
* Add onnx parser and unit test
```
  4c90e9a3
- Improve handling of generator expressions when getting the flags for hip (#1055) · 3f392a3b
  Paul Fultz II authored Jan 20, 2022
```
* Improve handling of generator expressions when getting the flags for hip
```
  3f392a3b
20 Jan, 2022 2 commits
- Add env variable to dump tests to a file (#1041) · 51b4439f
  Paul Fultz II authored Jan 20, 2022
  
  51b4439f
- Update satackey/action-docker-layer-caching to v 0.0.11 (#1057) · 0fbbba26
  Chris Austen authored Jan 20, 2022
```
There have been hangs in the CI runs recently.  Github runner jobs
are failing due to exceeding file system size.  Upgrading to 0.0.11
resolves this issue.
```
  0fbbba26
17 Jan, 2022 1 commit
- Make clip a pointwise op (#1043) · b0ece214
  Paul Fultz II authored Jan 17, 2022
```
Make clip a pointwise op
```
  b0ece214
11 Jan, 2022 1 commit

HardSigmoid ONNX parser (#1040) · fc42d852

turneram authored Jan 11, 2022

Add HardSigmoid onnx parser and unit tests
Produces mathematical equivalent to ONNX operator through combination of existing pointwise ops.
Resolves #1028

fc42d852

10 Jan, 2022 1 commit
- Handle miopen fusions when using pointwise fusions (#1019) · 534a05c1
  Paul Fultz II authored Jan 10, 2022
```
* Add matcher for conv_bias pointwise
* Add fusion op
```
  534a05c1
05 Jan, 2022 1 commit
- Fix time seed bug in random sequence ops (#1027) · 594f2802
  turneram authored Jan 05, 2022
```
Fix bug caused by casting time seed to float
```
  594f2802
10 Dec, 2021 1 commit

Updates to examples (#1022) · 46b0c33b

Cagri authored Dec 10, 2021

nfnet update
3dunet requirements via pip
3dunet requirement and nb-clean

46b0c33b

09 Dec, 2021 2 commits

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

Fuse last instruction in fuse_pointwise (#1015) · e758d457
Paul Fultz II authored Dec 09, 2021
```
Fuse last instruction in fuse_pointwise
This is also fixes a bug with using an invalid iterator.
```
e758d457

08 Dec, 2021 1 commit
- Fuse convert ops (#1020) · 00bfed4d
  Paul Fultz II authored Dec 08, 2021
  
  00bfed4d
07 Dec, 2021 2 commits

Rename reduce_inputs to virtual_inputs (#1021) · 1793cc54
Paul Fultz II authored Dec 07, 2021
```
simple variable rename
```
1793cc54

Test runner match input output using tensor names (#996) · 0f9b4072

Shucai Xiao authored Dec 07, 2021

1. Previous implementation assumes inputs and outputs .pb files are ordered, but it is not the case. So, we should use the name of the tensors in the input/output .pb files to match the input and output in the onnx model. (This change applies to the BERT_Squad model)
2. When parsing a model with dynamic input shape, current implementation uses the default batch_size for the unknown dims, which can cause parsing error for some cases (e.g. mask_rcnn model). The solution is we first read an input to get the shape, then use these shapes to parse the onnx model.

0f9b4072

05 Dec, 2021 1 commit
- Change in documentation for roctx knob (#984) · 26d90328
  Cagri authored Dec 05, 2021
```
Adds description for roctx knob of migraphx-driver in documentation.
```
  26d90328
02 Dec, 2021 1 commit
- Fix pointwise compile error with half sqrt (#1010) · 7b3e58a0
  Paul Fultz II authored Dec 02, 2021
```
Fix pointwise compile error with half sqrt 
```
  7b3e58a0
30 Nov, 2021 2 commits
- Fix fusable_conv whitespace bug (#1008) · 9270ebaf
  turneram authored Nov 30, 2021
```
Fix whitespace bug in fusable_conv matcher and add unit test
```
  9270ebaf
- Fix vectorization of broadcasted inputs in pointwise fusions (#1011) · 5dfafd00
  Paul Fultz II authored Nov 30, 2021
  
  5dfafd00
25 Nov, 2021 2 commits

Non std shape auto contiguous (#1001) · 2d4dcc47

Shucai Xiao authored Nov 25, 2021

Resolves a problem in parsing the ssd-10 model.

The problem is, after inserting contiguous in the auto_contiguous pass, standard output shape of some operators becomes non-standard. Then, if the next operator requires standard input shape, an exception is throw.

For example, if we pass the following model:
Input (standard shape) -> transpose (transposed) -> softmax (transposed) -> transpose (standard) -> gather.
It works fine, and no contiguous is required.

In the auto_contiguous pass, a contiguous is inserted after the first transpose. Then we need to replace the first transpose with the contiguous and recompute all shapes. When it comes to the gather operator, its input is a transposed shape, and an exception is thrown.

The solution is in the recompute_shape() function. If it is called by the auto_contiguous pass and shape of an instruction is changed, and the shape is non_standard, we do not recompute shape of its output. The reason is: since its output shape is non_standard, a contiguous op will be added after the instruction, which will recompute shape for later operators.

2d4dcc47

Bump tensorflow from 2.5.1 to 2.5.2 in /examples/nlp/python_bert_squad (#1004) · 2788f647

dependabot[bot] authored Nov 25, 2021

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.1 to 2.5.2.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.1...v2.5.2

)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

2788f647

24 Nov, 2021 1 commit
- Check jit kernels files with clang-tidy (#1012) · a33d6fa2
  Paul Fultz II authored Nov 24, 2021
```
* Check jit kernels files with clang-tidy
```
  a33d6fa2
22 Nov, 2021 2 commits

Helper script for rocTX run and parse (#985) · 4f9a0ce7

Cagri authored Nov 22, 2021

This provides a helper script to run rocTX markers with migraphx-driver and reduces the number of steps a user would go through running rocTX knob.
Run:
python roctx.py --run --onnx_file <ONNX_FILE> --migraphx_args "--onnx --gpu --fp16 --batch 16" --out outputfolder
Runs and parses the run output (JSON file). An example output is given below:

SUM MIN MAX
Marker start: gpu::convolution 5272 10 563
Marker start: gpu::add_relu 605 12 18
Marker start: gpu::gather 299 145 154
Marker start: gpu::mul_add 227 14 57
Marker start: gpu::sub 177 13 42
Marker start: gpu::concat 169 22 31
Marker start: gpu::triadd_relu 163 15 18
Marker start: load 141 0 3
Marker start: hip::hip_copy_literal 111 0 3
Marker start: gpu::add 58 13 17
Marker start: broadcast 52 0 3
Marker start: gpu::convert 31 15 16
Marker start: slice 11 0 1
Marker start: gpu::pooling 9 9 9
Marker start: step 2 2 2
Marker start: @param 2 0 1
Marker start: reshape 1 0 1
Marker start: hip::hip_allocate_memory 1 1 1
Marker start: check_context::migraphx::version_... 0 ERR ERR

TOTAL TIME: 7331 us

JSON FILE PATH: [...]/rpl_data_211019_195229_9369/input_results_211019_195229/trace.json
Parse:
python roctx.py --parse --json_path <JSON PATH FROM RUN>
Note: The parse knob is made available if the user wants to parse an already existing JSON output.

4f9a0ce7

Add fp16 verify to driver (#988) · 3c1e91dc

kahmed10 authored Nov 22, 2021

Allows --fp16 to be used in the driver to compare the target fp16 result and the ref fp32 result.

3c1e91dc