Commits · e5242676aa7f1e246a3c71f10e21f9bd85feab3d · gaoqiong / MIGraphX

25 Feb, 2022 1 commit
- Add get_queue to context to get the current stream (#1097) · e5242676
  Paul Fultz II authored Feb 24, 2022
```
wrapped in a any_ptr class so the type can be checked at runtime for a mismatch.
```
  e5242676
23 Feb, 2022 1 commit

Shucai Xiao authored Feb 23, 2022

This PR is the resolve two problems in the issue#999, i.e., non_standard_shape input to reshape and reduce_mean.
Three fixes:

Any operator that has a standard shape requirement will add a contiguous input for its input.
Eliminate_contiguous, when computing whether a contiguous can be removed, we should use all the updated args, not just the one that is being checked.
In two optimization in the simplify_reshape, we remove the contiguous in the reshaper name list, since eliminate_contiguous will remove the contiguous if it can be removed.
the solution is add an attribute to the operator that requires standard input shape, then in the auto_contiguous pass, add a contiguous to every input of such operators.

98dfdf15

16 Feb, 2022 1 commit
- Support nonstandard shapes for the UnSqueeze Op (#1071) · 4480eb79
  Umang Yadav authored Feb 16, 2022
```
Support nonstandard shapes like slice, broadcast and transpose for the unsqueeze op
```
  4480eb79
09 Feb, 2022 1 commit
- Support nonstandard shapes for the Squeeze Op (#1068) · e64b773f
  Umang Yadav authored Feb 09, 2022
```
Support slice, broadcast and transpose shapes for the squeeze op.
```
  e64b773f
08 Feb, 2022 1 commit
- Enforce types to avoid compilation error in pointwise fusions (#1077) · 73b8a773
  Paul Fultz II authored Feb 08, 2022
```
Enforce types to avoid compilation error in pointwise fusions
This fixes compile failure: gpt-2, fp16 on Navi
```
  73b8a773
02 Feb, 2022 1 commit

Update trace_eval to preview the output buffers (#1073) · b20e3d4d

Paul Fultz II authored Feb 02, 2022

Currently, MIGRAPHX_TRACE_EVAL=2 prints out the entire output buffer, but this can produce a lot of output. To make it easier to inspect and debug, using MIGRAPHX_TRACE_EVAL=2 now only prints 10 elements from the buffer(the first 5 and last 5) and shows any fp classifications found in the buffer(ie nans, infinity, etc). The previous behavior can still be enabled with MIGRAPHX_TRACE_EVAL=3.

b20e3d4d

28 Jan, 2022 1 commit

Add auto-vectorization of pointwise operators (#1047) · 78a3c9b7

Paul Fultz II authored Jan 28, 2022

* Enable auto vectorization
* Handle vector types with convert function
* Dont vectorize when it will cause problems with preload

78a3c9b7

27 Jan, 2022 1 commit
- Remove Standard Shape requirement for ArgOps (#1042) · 332cb710
  Umang Yadav authored Jan 27, 2022
```
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
```
  332cb710
17 Jan, 2022 1 commit
- Make clip a pointwise op (#1043) · b0ece214
  Paul Fultz II authored Jan 17, 2022
```
Make clip a pointwise op
```
  b0ece214
08 Dec, 2021 1 commit
- Fuse convert ops (#1020) · 00bfed4d
  Paul Fultz II authored Dec 08, 2021
  
  00bfed4d
25 Nov, 2021 1 commit

Non std shape auto contiguous (#1001) · 2d4dcc47

Shucai Xiao authored Nov 25, 2021

Resolves a problem in parsing the ssd-10 model.

The problem is, after inserting contiguous in the auto_contiguous pass, standard output shape of some operators becomes non-standard. Then, if the next operator requires standard input shape, an exception is throw.

For example, if we pass the following model:
Input (standard shape) -> transpose (transposed) -> softmax (transposed) -> transpose (standard) -> gather.
It works fine, and no contiguous is required.

In the auto_contiguous pass, a contiguous is inserted after the first transpose. Then we need to replace the first transpose with the contiguous and recompute all shapes. When it comes to the gather operator, its input is a transposed shape, and an exception is thrown.

The solution is in the recompute_shape() function. If it is called by the auto_contiguous pass and shape of an instruction is changed, and the shape is non_standard, we do not recompute shape of its output. The reason is: since its output shape is non_standard, a contiguous op will be added after the instruction, which will recompute shape for later operators.

2d4dcc47

22 Nov, 2021 1 commit

Add fp16 verify to driver (#988) · 3c1e91dc

kahmed10 authored Nov 22, 2021

Allows --fp16 to be used in the driver to compare the target fp16 result and the ref fp32 result.

3c1e91dc

17 Nov, 2021 1 commit

Handle removing contiguous on operators that use modules (#1005) · 785307c3

Paul Fultz II authored Nov 17, 2021

Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.

- Update to pass the module inputs correctly to compute_shape
- Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
- Add tests with contiguous and pointwise module function.
- Move add_pointwise function to a seperate header to reuse across different tests

785307c3

15 Nov, 2021 1 commit

Update driver's perf report to account for batch size (#1000) · 19f65e7e

kahmed10 authored Nov 15, 2021

Currently we have the option of passing in --batch to the driver to change the batch size when the model has a dynamic dim value. We can use this flag to adjust the perf report's rate.

19f65e7e

11 Nov, 2021 1 commit

Conditionally enable pointwise fusion (#992) · 157935ff

Paul Fultz II authored Nov 10, 2021

This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.

This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.

157935ff

05 Nov, 2021 1 commit
- Update Docker to ROCm 4.5 and support Navi on Jenkins (#994) · 04e17804
  kahmed10 authored Nov 05, 2021
```
Moving our Docker file from ROCm 4.3 to 4.5 
Add Navi base GPUs in to the CI infrastructure 
```
  04e17804
28 Oct, 2021 1 commit

NonMaxSuppression op ref implementation (#968) · c98b22d8

Shucai Xiao authored Oct 28, 2021

This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.

c98b22d8

20 Oct, 2021 1 commit

Roialign (#952) · d7653732

Shucai Xiao authored Oct 20, 2021

Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.

d7653732

19 Oct, 2021 1 commit

Fusion of pointwise operators (#969) · 351007d4

Paul Fultz II authored Oct 19, 2021

Adds a pass to fuse pointwise operators into one "pointwsie" op that has a submodule which does the calculation.

351007d4

18 Oct, 2021 1 commit
- Remove redundant cast (#982) · a05113aa
  Paul Fultz II authored Oct 18, 2021
```
Enable a cppcheck rule to catch these redundant casts in the future
```
  a05113aa
15 Oct, 2021 1 commit

Enabling rocTX markers for migraphx-driver via roctx knob (#946) · 4a71ec8c

Cagri authored Oct 14, 2021



Added features:
This enables wrapping each migraphx operator with rocTX markers.
It adds new knob trace to migraphx-driver binary.

Limitation:

rocTX standalone does not output a file, it needs to be used with rocprof. Example command line:

/opt/rocm/bin/rocprof -i ./in.txt --hip-trace --roctx-trace --flush-rate 10ms --timestamp on -d cagri_out --obj-tracking on /opt/rocm/bin/migraphx-driver trace ./resnet50-v2-7.onnx --onnx --gpu
Co-authored-by: Shucai Xiao <shucai@gmail.com>

4a71ec8c

08 Oct, 2021 2 commits

Nonzero op extension (#870) · 0879b5f1

Shucai Xiao authored Oct 08, 2021

This PR is for the nonzero operator with static output shape.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0879b5f1

Remove alpha and beta from `dot` and `quant_dot` (#961) · 21193e87

Umang Yadav authored Oct 08, 2021

Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.

Aim is to have the definition of dot operator as C = A . B without having alpha or beta.

In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.

21193e87

01 Oct, 2021 1 commit

Add multinomial op (#954) · 0b7672d7

turneram authored Oct 01, 2021

Add multinomial op to onnx parser with ref and GPU implementations.

The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution.

Resolves #821
Co-authored-by: Shucai Xiao <shucai@gmail.com>

0b7672d7

21 Sep, 2021 1 commit
- Add flag to bypass passes on modules (#949) · da26db34
  Paul Fultz II authored Sep 21, 2021
```
Needed to bypass passes when fusing pointwise operators into a module.
```
  da26db34
17 Sep, 2021 2 commits

Revert "Remove alpha and beta attributes from dot operator (#945)" (#957) · 985f58b0
Paul Fultz II authored Sep 17, 2021
```
This reverts commit 9e43cb8b.
```
985f58b0

Remove alpha and beta attributes from dot operator (#945) · 9e43cb8b

Umang Yadav authored Sep 17, 2021

This PR aims to remove alpha and beta attributes from dot operator completely.

Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.

Aim is to have the definition of dot operator as C = A . B without having alpha or beta.

9e43cb8b

16 Sep, 2021 1 commit

Loop operator (#853) · a275f590

Shucai Xiao authored Sep 16, 2021

Add Loop operator for opset version 13.
Notes: 1) Default max iteration number is 10 if no max iteration number is provided
2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model.
3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later.
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

a275f590

07 Sep, 2021 2 commits

qdq for quantization and include subgraph (#891) · b45f7239

Shucai Xiao authored Sep 07, 2021



Add operators, refactor parsers, add rewrite passes, add tests
Add ref implementations
Move broadcasting of scales and zero points to onnx parser
Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type
fp16 and fp8 quantization to include subgraph and parameters
fix unit test to use qdq operators for int8 quantization
Co-authored-by: turneram <alturner@amd.com>

b45f7239

Allow creating modules in a module pass (#931) · ac0f79aa
Paul Fultz II authored Sep 07, 2021
```
* Add module pass manage
```
ac0f79aa

02 Sep, 2021 2 commits

Refactor where op (#918) · ebbaf8fc

turneram authored Sep 02, 2021

Implement the Where operator for the CPU and GPU.  This is for better performance.

ebbaf8fc

Topk op (#877) · 521b57a2

Shucai Xiao authored Sep 01, 2021



* add topk operator doe ref, cpu and gpu
* Hash modules for quicker lookup of modules
* add onnx unit test
* add unit tests for the topk operator
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

521b57a2

01 Sep, 2021 1 commit
- Add a command to the driver to list supported onnx operators (#938) · 1f741f73
  Paul Fultz II authored Sep 01, 2021
```
* Add a command to list supported onnx operators
```
  1f741f73
31 Aug, 2021 3 commits

Enable constructing argument with tuple and buffer (#919) · b90d69ae

Paul Fultz II authored Aug 31, 2021



* Improve handling of constructing a tuple from a buffer
* Add unit test
* Remove unused function
Co-authored-by: Shucai Xiao <shucai@gmail.com>

b90d69ae

Changes to support both OneDNN and ZenDNN builds (#929) · 0859fe90

kahmed10 authored Aug 31, 2021



* Add preallocate method

* Add preallocate_param pass

* Preallocate buffers on the cpu

* Formatting

* Preallocate on the gpu

* Add missing cpp file

* Formatting

* Add lifetime function

* Formatting

* Improve handling of exceptions in test driver

* Formatting

* Auto print exception

* Formatting

* Fork each test case

* Formatting

* Exclude gcc 5 debug build

* Fix tidy issues

* Add color

* Formatting

* Create driver class

* Formatting

* Customize test_case names

* Formatting

* Report status from forked processes

* Formatting

* Update the verify driver

* Formatting

* Print out failed tests

* Formatting

* Fix tidy issues

* Formatting

* Expect passing

* Improve failure reporting on non-linux systems

* Fix ifdef

* Always allocate

* Fix tidy warning

* Flush code code cov

* Formatting

* Fix tidy

* Add const

* Check if weak symbols is linked

* Formatting

* initial progress

* formatting

* Add continue flag

* Formatting

* Set exe name

* Use stringstream and use quotes

* rename vars

* formatting

* more testing

* formatting

* Fix bug when using --continue in the tests

* Formatting

* revert gemm

* revert dot file

* rename var

* update cmakelists and deconv compute
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0859fe90

Fix debug assert (#930) · bd85a76c

Shucai Xiao authored Aug 31, 2021

* fix two asserts for debug build

* add unit test for copy parameters

* clang format

* add a unit test for reorder_dims

* change tranpose to always require perm not be empty

* clang format

* remove an unnecessary line

* fix tidy error

* fix review comments

bd85a76c

24 Aug, 2021 1 commit

Change attributes names to be more consistent and reflect better meaning (#916) · 0d2606bb

Umang Yadav authored Aug 24, 2021

* rename broadcast and multibroadcast output_lens attribute to out_lens attribute, and change tests and source code to reflect the same

* change the reshape attribute from dims to out_lens

* change transpose attribute's name from dims to perm to reflect better meaning

* use permutation instead of perm for transpose

clang formaating

* use dims instead of out_lens for reshape

clang formatting

0d2606bb

20 Aug, 2021 1 commit

unary scalar input processing (#912) · d689e2d1

Shucai Xiao authored Aug 20, 2021

* unary scalar input processing

* remove an unnecessary change

* remove unnecessary blank line

d689e2d1

18 Aug, 2021 1 commit

Optimize Q/DQ Format Pass (#889) · 0b5f33b6

turneram authored Aug 18, 2021

* Add operators, refactor parsers, add rewrite passes, add tests

* Add ref implementations

* Move broadcasting of scales and zero points to onnx parser

* Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type

* Switch certain variables to int64_t

* Fix overflow in implicit constant conversion

* Remove operators.hpp from includes in tf_test.cpp

* Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes

* Switch dequantizelinear math from int32 to float

* Remove changes to operators.hpp

* Simplify apply_quantizelinear

* Add verify test for int32 data

* Add rewrite_quantization back to CMakeLists

* Add passes to insert qdq after add_bias is applied, replace quant_ops, and remove remaining qdq pairs

* Renaming, refactoring, cleaning up code, adding formal test, and adding passes to targets

* Renaming, review comments, begin adding more specific tests

* Add more specific unit tests

* Fix failing test on CI

* Correct matcher and update qop rewriting, update tests and add more tests

* Update matcher, clean up simplify_qdq, tweak tests

* Add tests, remove pass from CPU target, update dot parameters, clean up simplify_qdq

* Fix correctness bug in ref q/dq implementations; edit gemm parser to make beta always 0.0

* Remove unused variables in onnx gemm tests

0b5f33b6

10 Aug, 2021 1 commit

Add option to compile with hiprtc (#892) · 91c9ebbc

Paul Fultz II authored Aug 10, 2021

* Add hiprtc compile option
* Add cross compile test
* Update error reporting
* Add tests for errors and warnings
* Fix tidy warning
* Add comment to ifdefs
* Skip null character at end of log
* Assert there is null at the end

91c9ebbc