Commits · 70320bbda07e3b60dd5e4532081a4ac9147af439 · gaoqiong / MIGraphX

24 Nov, 2021 1 commit
- lowering fp16 lrn to fp32 lrn on the gpu target · 70320bbd
  Shucai Xiao authored Nov 24, 2021
  
  70320bbd
18 Nov, 2021 1 commit
- Parallel compilation (#1007) · b0bc71cd
  Paul Fultz II authored Nov 18, 2021
```
Do compilation in parallel
```
  b0bc71cd
11 Nov, 2021 1 commit

Conditionally enable pointwise fusion (#992) · 157935ff

Paul Fultz II authored Nov 10, 2021

This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.

This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.

157935ff

09 Nov, 2021 1 commit
- Failing fusion plan workaround (#995) · fb39e5e4
  turneram authored Nov 09, 2021
```
* Add workaround for devices that do not support miopen conv fusions
```
  fb39e5e4
28 Oct, 2021 2 commits

NonMaxSuppression op ref implementation (#968) · c98b22d8

Shucai Xiao authored Oct 28, 2021

This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.

c98b22d8

Roialign gpu impl (#972) · 912c8d22

Shucai Xiao authored Oct 28, 2021

GPU implementation of the roialign operator, using the jit approach to reduce the lib size.

912c8d22

20 Oct, 2021 1 commit

Roialign (#952) · d7653732

Shucai Xiao authored Oct 20, 2021

Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.

d7653732

08 Oct, 2021 2 commits

Nonzero op extension (#870) · 0879b5f1

Shucai Xiao authored Oct 08, 2021

This PR is for the nonzero operator with static output shape.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0879b5f1

Remove alpha and beta from `dot` and `quant_dot` (#961) · 21193e87

Umang Yadav authored Oct 08, 2021

Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.

Aim is to have the definition of dot operator as C = A . B without having alpha or beta.

In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.

21193e87

01 Oct, 2021 1 commit

Add multinomial op (#954) · 0b7672d7

turneram authored Oct 01, 2021

Add multinomial op to onnx parser with ref and GPU implementations.

The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution.

Resolves #821
Co-authored-by: Shucai Xiao <shucai@gmail.com>

0b7672d7

27 Sep, 2021 1 commit

Dpp opts for wavefront 32 (#951) · 6e2df9de

kahmed10 authored Sep 27, 2021

Checks wavefront size, then changes implementation and number of threads for DPP reduce

6e2df9de

17 Sep, 2021 2 commits

Revert "Remove alpha and beta attributes from dot operator (#945)" (#957) · 985f58b0
Paul Fultz II authored Sep 17, 2021
```
This reverts commit 9e43cb8b.
```
985f58b0

Remove alpha and beta attributes from dot operator (#945) · 9e43cb8b

Umang Yadav authored Sep 17, 2021

This PR aims to remove alpha and beta attributes from dot operator completely.

Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.

Aim is to have the definition of dot operator as C = A . B without having alpha or beta.

9e43cb8b

16 Sep, 2021 1 commit

Loop operator (#853) · a275f590

Shucai Xiao authored Sep 16, 2021

Add Loop operator for opset version 13.
Notes: 1) Default max iteration number is 10 if no max iteration number is provided
2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model.
3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later.
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

a275f590

10 Sep, 2021 1 commit
- Assert the shape for compute and compute_shape are the same (#936) · 8b4c69c5
  Paul Fultz II authored Sep 10, 2021
```
Assert shapes dont change
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
```
  8b4c69c5
02 Sep, 2021 2 commits

Refactor where op (#918) · ebbaf8fc

turneram authored Sep 02, 2021

Implement the Where operator for the CPU and GPU.  This is for better performance.

ebbaf8fc

Topk op (#877) · 521b57a2

Shucai Xiao authored Sep 01, 2021



* add topk operator doe ref, cpu and gpu
* Hash modules for quicker lookup of modules
* add onnx unit test
* add unit tests for the topk operator
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

521b57a2

01 Sep, 2021 1 commit

Adjust HIP_COMPILER_FLAGS to support <$:$<>:> and SHELL: tags (#933) · 33a17257

Chris Austen authored Sep 01, 2021



In ROCm 4.5.0 hip compile flags are coming in differently.  This has
caused some parsing issues for the HIP_COMPILER_FLAGS variable.  As an example

    ROCm 4.3.0: --offload-arch=gfx900
    ROCm 4.5.0: <$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900>

Using existing code...
    $<$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900>
Becomes...
    $<$<COMPILE_LANGUAGE:CXX>:SHELL:

There are two problems with that.
  1) The "<" is not balanced with a "> due to the regex consuming the ">"
  2) There is still a `SHELL:`  label.

This commit repairs both.  I took the regex parsing code from ROCmSoftwarePlatform/MIOpen/blame/develop/CMakeLists.txt
but improved it to support handling of target features like
<$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900:xxx+>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

33a17257

31 Aug, 2021 1 commit

Fix debug assert (#930) · bd85a76c

Shucai Xiao authored Aug 31, 2021

* fix two asserts for debug build

* add unit test for copy parameters

* clang format

* add a unit test for reorder_dims

* change tranpose to always require perm not be empty

* clang format

* remove an unnecessary line

* fix tidy error

* fix review comments

bd85a76c

24 Aug, 2021 1 commit

Change attributes names to be more consistent and reflect better meaning (#916) · 0d2606bb

Umang Yadav authored Aug 24, 2021

* rename broadcast and multibroadcast output_lens attribute to out_lens attribute, and change tests and source code to reflect the same

* change the reshape attribute from dims to out_lens

* change transpose attribute's name from dims to perm to reflect better meaning

* use permutation instead of perm for transpose

clang formaating

* use dims instead of out_lens for reshape

clang formatting

0d2606bb

19 Aug, 2021 1 commit
- Enable warnings when jit compiling (#913) · ccff6beb
  Paul Fultz II authored Aug 19, 2021
```
* Enable warnings when jit compiling

* Formatting
```
  ccff6beb
18 Aug, 2021 2 commits

Optimize Q/DQ Format Pass (#889) · 0b5f33b6

turneram authored Aug 18, 2021

* Add operators, refactor parsers, add rewrite passes, add tests

* Add ref implementations

* Move broadcasting of scales and zero points to onnx parser

* Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type

* Switch certain variables to int64_t

* Fix overflow in implicit constant conversion

* Remove operators.hpp from includes in tf_test.cpp

* Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes

* Switch dequantizelinear math from int32 to float

* Remove changes to operators.hpp

* Simplify apply_quantizelinear

* Add verify test for int32 data

* Add rewrite_quantization back to CMakeLists

* Add passes to insert qdq after add_bias is applied, replace quant_ops, and remove remaining qdq pairs

* Renaming, refactoring, cleaning up code, adding formal test, and adding passes to targets

* Renaming, review comments, begin adding more specific tests

* Add more specific unit tests

* Fix failing test on CI

* Correct matcher and update qop rewriting, update tests and add more tests

* Update matcher, clean up simplify_qdq, tweak tests

* Add tests, remove pass from CPU target, update dot parameters, clean up simplify_qdq

* Fix correctness bug in ref q/dq implementations; edit gemm parser to make beta always 0.0

* Remove unused variables in onnx gemm tests

0b5f33b6

Fix error: namespace "std" has no member "cout" (#911) · 4e3b2e3c
turneram authored Aug 18, 2021
```
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
```
4e3b2e3c

10 Aug, 2021 1 commit

Add option to compile with hiprtc (#892) · 91c9ebbc

Paul Fultz II authored Aug 10, 2021

* Add hiprtc compile option
* Add cross compile test
* Update error reporting
* Add tests for errors and warnings
* Fix tidy warning
* Add comment to ifdefs
* Skip null character at end of log
* Assert there is null at the end

91c9ebbc

09 Aug, 2021 1 commit
- check for divisor encodable or not, fallback if needed (#906) · a8d86615
  Cagri Eryilmaz authored Aug 09, 2021
```
* check for divisor encodable or not, fallback if needed

* verify test for retinaface case
```
  a8d86615
05 Aug, 2021 1 commit

Add gpu driver and improvements to pointwise codegen (#851) · 29fa2666

Paul Fultz II authored Aug 05, 2021



* Add method to compile pointwise

* Formatting

* Add lambda

* Add semicolon

* Rename variable

* Add driver to run jit kernels

* Formatting

* Add context

* Formatting

* Make seperate driver folder

* Add more general gpu driver

* Formatting

* Print out wll time

* Formatting

* Run multiple times and skip first run

* Formatting

* Seperate time_op

* Run an op for comparison

* Formatting

* Add debug asserts

* Formatting

* Change parameer name

* Formatting

* Fix argument order

* Formatting

* Add preloading

* Formatting

* Allow a different data type

* Formatting

* Pipeline transformations

* Formatting

* Add vectorization

* Formatting

* Reduce dims

* Formatting

* Compile with launch params as constant

* Formatting

* Make sure buffer can be vecotrized

* Formatting

* Enable vectorization and preloading

* Formatting

* Add print header

* Formatting

* Avoid allocating to large of LDS

* Formatting

* Add some vec functions to a seperate header

* Formatting

* Add stride loops

* Formatting

* Improve the transform pipeline

* Formatting

* Add const

* Fix shape check

* Formatting

* Just check stride axis is zero

* Remove extra finc_vector_axis overload

* Simplify some mroe functions

* Formatting

* Remove some more extra functions

* Formatting

* Simplify more decltypes

* Add another const

* Fix test

* Get buffer pointer different for older compilers
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

29fa2666

15 Jul, 2021 1 commit

Quantize linear ops (#843) · 3282e01a

turneram authored Jul 15, 2021

* Add operators, refactor parsers, add rewrite passes, add tests

* Formatting

* Fix cppcheck

* Review comments

* Formatting

* Combine rewrite passes

* Formatting

* Add ref implementations

* Formatting

* Review comments

* Formatting

* Tidy warnings

* Apply review comments

* Formatting

* Fix CI error

* Formatting

* Increase code coverage

* Formatting

* Move broadcasting of scales and zero points to onnx parser

* Formatting

* Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type

* Formatting

* Increase code coverage

* Formatting

* Switch certain variables to int64_t

* Formatting

* Fix overflow in implicit constant conversion

* Formatting

* Increase code coverage

* Formatting

* Remove operators.hpp from includes in tf_test.cpp

* Formatting

* Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes

* Formatting

* Switch dequantizelinear math from int32 to float

* Formatting

* Remove changes to operators.hpp

* Simplify apply_quantizelinear

* Formatting

* Add verify test for int32 data

* Add rewrite_quantization back to CMakeLists

3282e01a

14 Jul, 2021 1 commit

Use the same device name function in the unit tests (#881) · 0b04fc80

Paul Fultz II authored Jul 14, 2021



* Unify device_name function

* Formatting
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0b04fc80

12 Jul, 2021 1 commit
- fixed a build error related to rocm3.7 (#873) · c2d7c96e
  Shucai Xiao authored Jul 12, 2021
```
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>
```
  c2d7c96e
09 Jul, 2021 1 commit
- fix review comments · 4e861c7c
  Shucai Xiao authored Jul 08, 2021
  
  4e861c7c
08 Jul, 2021 4 commits

Add inclusive scan on the GPU (#872) · 6ba279cc

Paul Fultz II authored Jul 08, 2021



* Add initial scan operator

* Formatting

* Fix with a working test

* Fix bugs

* Formatting

* Formatting

* Simplify

* Formatting

* Use non-power of 2 for test

* Make pointer
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

6ba279cc

Preallocate parameters on the CPU and unify preallocations (#840) · 427fc25c

Paul Fultz II authored Jul 08, 2021



* Add preallocate method

* Add preallocate_param pass

* Preallocate buffers on the cpu

* Formatting

* Preallocate on the gpu

* Add missing cpp file

* Formatting

* Add lifetime function

* Formatting

* Always allocate

* Fix tidy warning

* Add const

* Add missing lifetime annotations
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

427fc25c

clang format · 4a24a2dd
Shucai Xiao authored Jul 08, 2021

4a24a2dd
fix review comments · 0281a411
Shucai Xiao authored Jul 08, 2021

0281a411

27 Jun, 2021 1 commit

Inline subgraph (#802) · bc52a8a8

Shucai Xiao authored Jun 27, 2021



* Add definitions for all pointwise operators

* Formatting

* Add cpp generator class

* Formatting

* Move compilation to core

* Formatting

* Add clock to tmp name

* Add dynamic loader

* Formatting

* Add tests for code gen

* Formatting

* Add test for literals

* Formatting

* Use with_char

* Add missing header

* Fix mismerge

* Ignore tidy warning

* Fxx gcc 5 errors

* Apply fixits

* Skip signed bitwise of status

* Remove unused parameters

* Explicitly add c++14 flag

* Fix tidy warning

* unify the compute function signature

* clang format

* make another change

* unify the compute function

* clang format

* remove unnecessary code

* more refinement about the operator compute funciton

* clang format

* add an overload function

* clang format

* add support for axes inputs for sequeeze/unsqueeze/reduce_sum

* clang format

* fix build problems

* backup code changes

* clang format

* Add tuple type to shape class

* Formatting

* fix a bug in parsing quantizelinear operator

* clang format

* fix a cppcheck error

* disable different versions of unit tests for different onnx version

* clang format

* upgrade onnx to 1.8

* update onnx to 1.8.1

* disable two more real models

* clang format

* Make data member private

* Formatting

* Add sub arguments

* Formatting

* Trun clang format off

* Disable clang-format

* fix review comments

* fix the function of assign axes in parsing the squeeze operator

* add unit tests and fix a bug

* clang format

* fix review comments

* clang format

* fix a build error

* backup code changes

* clang format

* add more unit tests and add parsing opset version

* clang format

* Improve visiting tuples

* Formatting

* fix cppcheck error

* adding installing the onnx package

* resolve no protobuf compiler

* add an inline subgraph pass

* clang format

* Add more argument tests

* Formatting

* Handle tuple in load

* Formatting

* code backup

* clang format

* Remove .o files

* Add tuple type to api

* Formatting

* fix build errors

* clang format

* code backup

* code backup

* add unit tests for the inline subgraph

* clang format

* refine the inline subgraph and parse if operator

* clang format

* fix cppcheck issue

* clang format

* add unit test for inline subgraph pass

* clang format

* fix format issue

* remove the context from the if operator

* clang format

* simplify the compute functions

* Fix tidy warnings

* fix cppcheck error

* clang format

* fix cppcheck error

* Fix tidy warnings

* fix a cppcheck error

* clang format

* Add a test for share method

* Formatting

* Add a test cpp_type

* add unit tests for more code coverage

* clang format

* add unit tests to have more code coverage

* clang format

* try a comment in jenkins build

* include the install onnnx line

* code backup

* reorder the dependenciesd installed

* refine dockerfile

* fix review comments

* clang format

* remove unnecessary overload function

* fix cppcheck error

* change back the argument test

* Suppress tidy warning

* add the operator get_tuple_elem

* clang format

* add get_tuple_elem to operator include file

* chang if to support multiple operation outputs

* clang format

* optimize inline subgraph

* clang format

* code backup

* clang format

* fix bug

* refine unit tests for tuple output of the if operator

* clang format

* refine a instruction replacement code

* add a unit test and sort all the unit tests alphabetically

* fix cppcheck error

* add more unit tests for multiple op outputs

* clang format

* fix cppcheck error

* Update pass manager to get modules after every pass

* more unit test to cover more scenarios

* clang format

* fixed a bug in a unit test

* add more tests

* clang format

* add more unit tests to have more code coverage

* fix a bug in a unit test

* Add program overload for module

* Formatting

* Hash modules for quicker lookup of modules

* Bump file version

* Add methods to remove modules

* Formatting

* add the tuple type to the support list

* Eliminate unused modules

* Formatting

* Fix test errors

* Foramtting

* Fix tidy issues

* fix problem related to inline subgraph

* clang format

* fix review comments

* fix review comments

* fix review comments

* fix review comments

* clang format

* fix a unit test

* one more code change

* remove an optimization related to the if operator

* clang format

* fix review comments
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

bc52a8a8

25 Jun, 2021 4 commits
- clang format · 27c0ae08
  Shucai Xiao authored Jun 25, 2021
  
  27c0ae08
- refine unit tests · dd651742
  Shucai Xiao authored Jun 25, 2021
  
  dd651742
- clang format · 7135da72
  Shucai Xiao authored Jun 25, 2021
  
  7135da72
- gpu implementation of the scatter operator · 973cafd4
  Shucai Xiao authored Jun 25, 2021
  
  973cafd4
15 Jun, 2021 1 commit

Int8 gemm support (#811) · 39bc6161

Shucai Xiao authored Jun 15, 2021



* add a flag to indicate int8x4 input format

* clang format

* code backup

* clang format

* code backup

* clang format

* code backup

* clang format

* code backup

* clang format

* code backup

* clang format

* remove log info

* remove unnecessary changes

* fix cppcheck error

* add unit tests to have more code coverage

* clang format

* add debug info

* remove log info

* fix cppcheck error

* clang format

* clang format

* add one more unit tests for more scenarios

* fix cppcheck error

* clang format

* fix review comments

* clang format

* rename p to m

* fix review comments

* refine unit tests

* clang format

* refine unit tests and fixed a bug

* clang format

* fix build error related to rocm4.2

* fix a bug related to alpha and beta

* refine two unit tests related to int8_gemm

* fix cppcheck error

* refine unit test to pass on mi100

* add unit test for packing int8 args

* clang format

* change unit tests back

* disable some unit tests for gpu

* clang format

* refine unit tests to run on mi100

* clang format

* refine unit tests

* refine unit tests

* clang format

* change back a unit test
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

39bc6161