Commits · 7dcae03776938bf983ef6184a13bb9bcada678b5 · gaoqiong / MIGraphX

29 Jul, 2022 1 commit

Avoid registering host buffer ptr multiple times during hip copies (#1245) · 7596f3f1

Umang Yadav authored Jul 29, 2022

Currently, while copying a host buffer to the device, it first registers/maps the host buffer pointer to address space of the device.

If the host buffer has been allocated by the hipHostMalloc then, it is implicitly registered to the device's address space, and no need to register again. This PR adds a check for the same.

7596f3f1

03 Jul, 2022 1 commit

Add mlir fusion (#1251) · ca8a54fe

Paul Fultz II authored Jul 03, 2022

* Add mlir c api

* Formatting

* Create a type attribute

* Formatting

* Parse module

* Formatting

* Add mlir dump function

* Add test case

* Formatting

* Fix tidy issues

* Update mlit version

* Update to newer mlir

* Format

* Move mlir to the gpu and update the test

* Formatting

* Fix bug when appending module

* Format

* Remove old cmake flag

* Update message

* Add return

* Format

* Add mlir_compile

* Format

* Register dialect

* Handle unsinged integers

* Dont provide output for return instruction

* Format

* Add code to insert memrefs

* Format

* Add mlir verification

* Formatting

* Enable pointwise_fusion

* Disable eliminate_data_type

* Set kernal name

* Format

* Fix device name

* Formatting

* Fix output arg

* Format

* Updates

* Upate hash

* Add fuse_mlir pass

* Format

* Add fuse mlir

* Format

* Update mlir

* Sort parameter names

* Format

* Reenable disabled passes

* Remove old mlir conv

* Remove asym default padding

* Add more verbose tracing

* Format

* Fix compilation errors

* Format

* Whitelist operators

* Format

* Add namespace

* Format

* Update triple

* Format

* Use func dialect

* Format

* Use func.return

* Format

* Upgrade mlir version

* Add comment

* Handle symetrical padding

* Format

* Cleanup debug output

* Format

* List failed tests

* Move mlir compile to jit pipeline

* Format

* Update version

* Add source locations

* Format

* Correctly add module

* Format

* Update failed tests

* Fix failures when mlir is disabled

* Format

* Update mlir version

* Check type for fp32

* Format

* Remove failed test

* Update mlir in driver

* Tidy fixes

* Foramt

* Tidy fixes

* Format

* Fix const

* Remove from requirements

* Fix cmake version

* Fix tidy warning

* Use another ifdef

* Fix tidy

* Other tidy fix

* Format

* Update hash

* Add missing license files

* Format

* Format

* Fix fnction name

ca8a54fe

22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
17 Jun, 2022 1 commit

Create allocate op and replace_allocate pass (#1183) · add6fb3b

kahmed10 authored Jun 17, 2022



* add allocate op header

* formatting

* add replace_allocate pass

* formatting

* move output param to remove_allocate pass

* formatting

* fix bugs in replace_allocate pass

* formatting

* fix verify if tests

* formatting

* move if op logic

* formatting

* cleanup lowering

* cleanup lowering

* formatting

* fix tidy

* formatting

* fix tidy

* add cpu allocate check

* formatting

* change cpu allocate in pass

* formatting

* add some tests for replace_allocate pass

* formatting

* pass by ref

* fix run_pass

* formatting

* update variable name for module

* update dce to use contains() and fix tidy

* formatting

* update cppcheck

* add if test

* formatting

* add if test

* rename var to mod_output_names

* formatting

* remove conditional

* update allocate op and tests

* formatting

* update replace_allocate tests

* update create_output_names() and conditional in replace_allocate

* formatting

* remove extra variable in replace_allocate

* update tools script for allocation_model
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

add6fb3b

06 May, 2022 1 commit
- Add compile tests for gpu math functions (#1182) · 6a5cda96
  Paul Fultz II authored May 06, 2022
```
Add compile tests for gpu math functions
```
  6a5cda96
29 Mar, 2022 1 commit

Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6

Paul Fultz II authored Mar 29, 2022

This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.

This also makes it easier to add new runtime compiled kernels in the future.

661046c6

25 Feb, 2022 1 commit
- Add get_queue to context to get the current stream (#1097) · e5242676
  Paul Fultz II authored Feb 24, 2022
```
wrapped in a any_ptr class so the type can be checked at runtime for a mismatch.
```
  e5242676
09 Feb, 2022 1 commit
- Enable pointwise fusion by default (#1082) · c7419a9c
  Paul Fultz II authored Feb 09, 2022
```
There is now a MIGRAPHX_DISABLE_POINTWISE_FUSION to disable it
```
  c7419a9c
17 Nov, 2021 1 commit

Handle removing contiguous on operators that use modules (#1005) · 785307c3

Paul Fultz II authored Nov 17, 2021

Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.

- Update to pass the module inputs correctly to compute_shape
- Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
- Add tests with contiguous and pointwise module function.
- Move add_pointwise function to a seperate header to reuse across different tests

785307c3

08 Oct, 2021 1 commit

Remove alpha and beta from `dot` and `quant_dot` (#961) · 21193e87

Umang Yadav authored Oct 08, 2021

Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.

Aim is to have the definition of dot operator as C = A . B without having alpha or beta.

In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.

21193e87

07 Sep, 2021 1 commit

qdq for quantization and include subgraph (#891) · b45f7239

Shucai Xiao authored Sep 07, 2021



Add operators, refactor parsers, add rewrite passes, add tests
Add ref implementations
Move broadcasting of scales and zero points to onnx parser
Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type
fp16 and fp8 quantization to include subgraph and parameters
fix unit test to use qdq operators for int8 quantization
Co-authored-by: turneram <alturner@amd.com>

b45f7239

31 Aug, 2021 1 commit

Fix debug assert (#930) · bd85a76c

Shucai Xiao authored Aug 31, 2021

* fix two asserts for debug build

* add unit test for copy parameters

* clang format

* add a unit test for reorder_dims

* change tranpose to always require perm not be empty

* clang format

* remove an unnecessary line

* fix tidy error

* fix review comments

bd85a76c

24 Aug, 2021 1 commit

Change attributes names to be more consistent and reflect better meaning (#916) · 0d2606bb

Umang Yadav authored Aug 24, 2021

* rename broadcast and multibroadcast output_lens attribute to out_lens attribute, and change tests and source code to reflect the same

* change the reshape attribute from dims to out_lens

* change transpose attribute's name from dims to perm to reflect better meaning

* use permutation instead of perm for transpose

clang formaating

* use dims instead of out_lens for reshape

clang formatting

0d2606bb

19 Aug, 2021 1 commit
- Enable warnings when jit compiling (#913) · ccff6beb
  Paul Fultz II authored Aug 19, 2021
```
* Enable warnings when jit compiling

* Formatting
```
  ccff6beb
10 Aug, 2021 1 commit

Add option to compile with hiprtc (#892) · 91c9ebbc

Paul Fultz II authored Aug 10, 2021

* Add hiprtc compile option
* Add cross compile test
* Update error reporting
* Add tests for errors and warnings
* Fix tidy warning
* Add comment to ifdefs
* Skip null character at end of log
* Assert there is null at the end

91c9ebbc

05 Aug, 2021 1 commit

Add gpu driver and improvements to pointwise codegen (#851) · 29fa2666

Paul Fultz II authored Aug 05, 2021



* Add method to compile pointwise

* Formatting

* Add lambda

* Add semicolon

* Rename variable

* Add driver to run jit kernels

* Formatting

* Add context

* Formatting

* Make seperate driver folder

* Add more general gpu driver

* Formatting

* Print out wll time

* Formatting

* Run multiple times and skip first run

* Formatting

* Seperate time_op

* Run an op for comparison

* Formatting

* Add debug asserts

* Formatting

* Change parameer name

* Formatting

* Fix argument order

* Formatting

* Add preloading

* Formatting

* Allow a different data type

* Formatting

* Pipeline transformations

* Formatting

* Add vectorization

* Formatting

* Reduce dims

* Formatting

* Compile with launch params as constant

* Formatting

* Make sure buffer can be vecotrized

* Formatting

* Enable vectorization and preloading

* Formatting

* Add print header

* Formatting

* Avoid allocating to large of LDS

* Formatting

* Add some vec functions to a seperate header

* Formatting

* Add stride loops

* Formatting

* Improve the transform pipeline

* Formatting

* Add const

* Fix shape check

* Formatting

* Just check stride axis is zero

* Remove extra finc_vector_axis overload

* Simplify some mroe functions

* Formatting

* Remove some more extra functions

* Formatting

* Simplify more decltypes

* Add another const

* Fix test

* Get buffer pointer different for older compilers
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

29fa2666

14 Jul, 2021 1 commit

Use the same device name function in the unit tests (#881) · 0b04fc80

Paul Fultz II authored Jul 14, 2021



* Unify device_name function

* Formatting
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0b04fc80

15 Jun, 2021 1 commit

Int8 gemm support (#811) · 39bc6161

Shucai Xiao authored Jun 15, 2021



* add a flag to indicate int8x4 input format

* clang format

* code backup

* clang format

* code backup

* clang format

* code backup

* clang format

* code backup

* clang format

* code backup

* clang format

* remove log info

* remove unnecessary changes

* fix cppcheck error

* add unit tests to have more code coverage

* clang format

* add debug info

* remove log info

* fix cppcheck error

* clang format

* clang format

* add one more unit tests for more scenarios

* fix cppcheck error

* clang format

* fix review comments

* clang format

* rename p to m

* fix review comments

* refine unit tests

* clang format

* refine unit tests and fixed a bug

* clang format

* fix build error related to rocm4.2

* fix a bug related to alpha and beta

* refine two unit tests related to int8_gemm

* fix cppcheck error

* refine unit test to pass on mi100

* add unit test for packing int8 args

* clang format

* change unit tests back

* disable some unit tests for gpu

* clang format

* refine unit tests to run on mi100

* clang format

* refine unit tests

* refine unit tests

* clang format

* change back a unit test
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

39bc6161

03 May, 2021 1 commit

Reduce types generated for hip kernels (#814) · 3becd974

Paul Fultz II authored May 03, 2021



* Remove unused data types

* Formatting

* Reduce types generated for hip kernels

* Formatting

* Fix onnx tests

* Formatting
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

3becd974

22 Apr, 2021 1 commit

Cpu fusions using post_ops (#781) · f7befe50

Paul Fultz II authored Apr 22, 2021



* Add eliminate_data_type pass

* Formatting

* Auto convert quant ops

* Formatting

* Flip the order of decompose

* Compute max size differently

* Formatting

* Clamp values in convert

* Formatting

* Fix loss of precision in reduce

* Formatting

* Fix bugs in reduction

* Fix accumulator type in reference softmax implementation

* Formatting

* Update convert test

* Remove unused variables

* Remove unnecessary quant_dot check

* Formatting

* Add tests

* Formatting

* Remove unused code

* Remove duplicate ops

* Remove blaze dependency

* Use set since shape::type_t is no hashable on gcc 5

* Formatting

* Add dnnl binary op

* Formatting

* Add binary and eltwise

* Formatting

* Add softmax

* Formatting

* Remove unused operators

* Add missing files

* Formatting

* Add lrn

* Formatting

* Add deconvolution

* Formatting

* Change allocate default

* Add reorder

* Formatting

* Add reductions

* Formatting

* Sort lines

* Change literals in another loop

* Add pow operator

* Formatting

* Add pow operator

* Formatting

* Make sure shapes are packed

* Allow broadcasted inputs

* Remove unused operators

* Simplify functions

* Remove softmax

* Add sub and erf functions

* Formatting

* Fix bug

* Formatting

* Improve parallism

* Formatting

* Allow multiple batch dimensions

* Formatting

* Move literal transforms out of lowering

* Formatting

* Add gather operator

* Sort lines

* Add early exit for carry

* Formatting

* Add missing concat

* Rename macro

* Fix deep nesting

* Formatting

* Fix cppcheck issues

* Remov else

* Move attribute to typedef

* Formatting

* Disable maybe-uninitialized warning since its broken on gcc

* Add constexpr default constructor

* Formatting

* Fix compiler warnings

* Fix adjust_allocation test

* Add layernorm matcher

* Add gelu_erf matcher

* Formatting

* Add gelu_tanh matcher

* Formatting

* Remove match namespace

* Formatting

* Use matcher instead of string

* Formatting

* Add fusions

* Formatting

* Add post op field

* Formatting

* Make post_ops serializable

* Formatting

* Add eltwise fusions

* Formatting

* Fix null conversions

* Formatting

* Add fuse_ops source files

* Formatting

* Set binary post op index correctly

* Formatting

* Fix serialization bugs

* Check if used once

* Formatting

* Fix error in get_primitive_attr

* Formatting

* Add compile function

* Formatting

* Limit fusions

* Formatting

* Disable with env variable instead of using compile arg

* Formatting

* Fix implicit conversion to bool

* Declar on seperate lines

* Formatting

* Fix cppcheck issues

* Fix ICE in pack_join

* Formatting

* Use const ref

* Make enum hashable

* Formatting

* Add explicit this

* Fix merge issues

* Fix dangling ref

* Formatting

* Add test for compile

* Formatting

* Add more value tests

* Formatting
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

f7befe50

19 Apr, 2021 1 commit

Add code generation for pointwise operators (#780) · 35d1bcc2

Paul Fultz II authored Apr 19, 2021

* Add definitions for all pointwise operators

* Formatting

* Add cpp generator class

* Formatting

* Move compilation to core

* Formatting

* Add clock to tmp name

* Add dynamic loader

* Formatting

* Add tests for code gen

* Formatting

* Add test for literals

* Formatting

* Use with_char

* Add missing header

* Fix mismerge

* Ignore tidy warning

* Fxx gcc 5 errors

* Apply fixits

* Skip signed bitwise of status

* Remove unused parameters

* Explicitly add c++14 flag

* Fix tidy warning

* Remove .o files

35d1bcc2

26 Mar, 2021 1 commit

Add initial code generation (#762) · 581d31b0

Paul Fultz II authored Mar 26, 2021



* Add code object op

* Formattting

* Add more value tests

* Formatting

* Fix from_value conversion from binary

* Formatting

* Dont use offload copy

* Remove iostream header

* Fix compilation errors

* Formatting

* Rename var

* Add missing files

* Formatting

* Remove duplicate variable

* Remove comment

* Template the function so sfinae will work

* Formatting

* Use template specialization since ADL is broken on hcc

* Formatting

* Annotate the constructor with HD for hcc

* Make variable const
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

581d31b0

26 Feb, 2021 1 commit

Add more supported operators and optimizations for the cpu backend (#746) · a0b570b2

Paul Fultz II authored Feb 26, 2021



* Add eliminate_data_type pass

* Formatting

* Auto convert quant ops

* Formatting

* Flip the order of decompose

* Compute max size differently

* Formatting

* Clamp values in convert

* Formatting

* Fix loss of precision in reduce

* Formatting

* Fix bugs in reduction

* Fix accumulator type in reference softmax implementation

* Formatting

* Update convert test

* Remove unused variables

* Remove unnecessary quant_dot check

* Formatting

* Add tests

* Formatting

* Remove unused code

* Remove duplicate ops

* Remove blaze dependency

* Use set since shape::type_t is no hashable on gcc 5

* Formatting

* Add dnnl binary op

* Formatting

* Add binary and eltwise

* Formatting

* Add softmax

* Formatting

* Remove unused operators

* Add missing files

* Formatting

* Add lrn

* Formatting

* Add deconvolution

* Formatting

* Change allocate default

* Add reorder

* Formatting

* Add reductions

* Formatting

* Sort lines

* Change literals in another loop

* Add pow operator

* Formatting

* Add pow operator

* Formatting

* Make sure shapes are packed

* Allow broadcasted inputs

* Remove unused operators

* Simplify functions

* Remove softmax

* Add sub and erf functions

* Formatting

* Fix bug

* Formatting

* Improve parallism

* Formatting

* Allow multiple batch dimensions

* Formatting

* Move literal transforms out of lowering

* Formatting

* Add gather operator

* Sort lines

* Add early exit for carry

* Formatting

* Add missing concat

* Rename macro

* Fix deep nesting

* Formatting

* Fix cppcheck issues

* Remov else

* Move attribute to typedef

* Formatting

* Disable maybe-uninitialized warning since its broken on gcc

* Add constexpr default constructor

* Formatting

* Fix compiler warnings

* Fix adjust_allocation test
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

a0b570b2

25 Feb, 2021 1 commit

Add code object custom op (#744) · 7220dd18

Paul Fultz II authored Feb 24, 2021



* Add code object op

* Formattting

* Add more value tests

* Formatting

* Fix from_value conversion from binary

* Formatting

* Dont use offload copy

* Remove iostream header
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

7220dd18

06 Jan, 2021 1 commit

Module impl (#678) · c9b86f1c

Shucai Xiao authored Jan 06, 2021



* add an api get_main_module

* clang format

* modify onnx unit test for module

* clang format

* refactor ops unit test with the get_main_module

* clang format

* code backup

* clang format

* refine module c api

* add python api for module

* clang format

* fix a python api issue

* clang format

* fix cppcheck error

* clang format

* refine unit tests changes

* clang format

* code backup

* code backup

* clang format

* defer some changes to later PRs

* change return of get_main_module from ref to pointer

* clang format

* add unit tests for the get_main_module_api

* clang format

* fix cppcheck error

* clang format

* fix cppcheck error

* clang format

* add more unit tests for more code change coverage

* clang format

* fixed a unit test error

* clang format

* fix unit test

* clang format

* code backup

* code change for more code coverage

* change program to module in various passes and matcher

* clang format

* modify the pass API

* code backup

* code backup

* clang format

* code backup

* clang format

* Add option to no generate a destroy method

* Formatting

* fix some review comments

* clang format

* fix review comments

* clang format

* clang format

* code backup

* code backup

* clang format

* fix cppcheck errors

* clang format

* clang format

* fix build errors

* clang format

* modify gpu unit tests to using module

* clang format

* fix cppcheck error

* clang format

* Add flag to enable cpu backend

* Make buffers shared

* Enable optimizations

* Formatting

* fix review comments

* code backup

* clang format

* code backup

* clang format

* fix a bug related to a unit test

* clang format

* clang format

* fix a build error

* remove unnecessary code

* remove unnecessary files

* code backup

* clang format

* remove the compile function from the module class

* clang format

* clang format

* remove the context parameter from the from_value method of the module class

* code refinement

* clang format

* merge changes from develop branch

* clang format

* fix cppcheck error

* clang format

* fix a build error

* fixed a merge error

* fix cppcheck error

* fixed review comments

* clang format

* fix cppcheck error

* fix a cppcheck error

* fix cppcheck error

* fix build error caused by merge

* Add missing has_op function

* Formatting

* merge changes from develop branch

* fix a cppcheck error

* fixed some review comments

* clang format

* remove the begin/end function of the program class

* clang format

* refine code and fix cppcheck error

* clang format

* fix review comments

* clang format

* fix review comments

* clang format

* add unit tests for more code coverage

* clang format

* fix review comments

* clang format

* fix review comments

* clang format

* fix a build error in debug mode

* clang format
Co-authored-by: Paul <pfultz2@yahoo.com>

c9b86f1c

14 Dec, 2020 1 commit

Use dnnl for cpu backend (#688) · 406afeb8

Paul Fultz II authored Dec 14, 2020



* Add flag to enable cpu backend

* Make buffers shared

* Enable optimizations

* Add onednn

* Formatting

* Formatting

* Add dnnl header

* Formatting

* Rewrite rnn first

* Formatting

* Call reference implementation

* Formatting

* Make literal data shared

* Formatting

* Add convolution

* Formatting

* Compensate for dilation

* Formatting

* Use name/make_op instead

* Formatting

* Rename gemm header

* Formatting

* Add dnnl convolution/gemm operators

* Formatting

* Add eliminate_contiguous

* Add faster pointwise operators

* Formatting

* Formatting

* Formatting

* Add dnnl op class

* Formatting

* Add add op

* Formatting

* Add concat operator

* Formatting

* Add more ops

* Create descriptor during finalization

* Formatting

* Dont rewrite pooling

* Enable memory coloring

* Formatting

* Add output aliases

* Formatting

* Fix errors

* Formatting

* Convert literals

* Add missing file

* Remove batch_norm

* Formatting

* Use strides

* Formatting

* Add some debug checks

* Formatting

* Fix big in adjusting shape for gemm

* Formatting

* Fix fallback dot operator

* Zero initialize buffers

* Add suport for group convolutions

* Formatting

* Make adjust allocation target independent

* Formatting

* Enable adjust_allocation for gpu/cpu

* Formatting

* Add copy to allocation model

* Formatting

* Add copy operator

* Formatting

* Better handling of output parameters in adjust_allocation

* Formatting

* Build with dnnl

* Make dnnl required

* Fix compile error

* Tidy fixes

* Formatting

* Tidy fixes

* Formatting

* Fix more tidy issues

* Formatting

* Add mul op

* Add mul op

* Set c compiler to clang as well

* Compensate for normalized compute shape

* Formatting

* Fix cppcheck errors

* Formatting

* Add onednn library to hcc

* Guard clang pragmas

* Disable cpu mode for gcc for now

* Leave it enabled it for gcc 7

* Fix cppcheck suppresion

* Fix compile error on gcc 5

* Remove unused code
Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

406afeb8

11 Nov, 2020 1 commit

Refactor program to module (#684) · 2466dd6f

Shucai Xiao authored Nov 11, 2020



* code backup

* clang format

* change corresponding tool files

* clang format
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

2466dd6f

09 Nov, 2020 1 commit

Add hip compilation (#664) · f71af72a

Paul Fultz II authored Nov 09, 2020



* Add compiler flags

* Add missing include

* Add filesystem header

* Formatting

* Add tmp_dir to run

* Formatting

* Kernel compilation and launching

* Formatting

* Seperate pack_args

* Formatting

* Add alignment tests

* Formatting

* Add compile test

* Formatting

* Complete compile test

* Formatting

* Use is_regular_file free function

* Fix is_regular_file call

* Fix tidy issues

* Fix tidy

* Fix tidy issue

* Print size in read_buffer to debug issue on jenkins

* Add hip flags before src file

* Fix reading output files

* Fix unsued variable warning

* Formatting

* Formatting

* Disable tidy check
Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

f71af72a

04 Nov, 2020 1 commit

Split cpu and reference implementation (#671) · 500d9441

Paul Fultz II authored Nov 04, 2020



* Add all_targets cmake target

* Rename target

* Add ref target

* Rename tests

* Refactor compiler target

* Formatting

* Verify for every target

* Formatting

* Add verify test suite

* Formatting

* Add initial test programs

* Formatting

* Add rnn tests

* Formatting

* Validate gpu

* Formatting

* Remove old gpu tests

* Fix gpu tests

* Fix ref error

* Fix tidy issues

* Formatting

* Tidy fixes

* Fix header in python api

* Rename to ref

* Use ref in verify_onnx

* Fix tidy issue

* Build with verbose on

* Fix typo

* Remove verbose

* rename some cpu prefix to ref
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>

500d9441

15 Oct, 2020 1 commit

Added greater and less operators (#660) · 48ffbfa5

turneram authored Oct 15, 2020



* Added greater and less operators

* Fixed ops_test.cpp

* Set commutative to false for less, greater

* Refactored parse_equal/less/greater into parse_compare_op

* Removed unnecessary function attributes() from greater.hpp/less.hpp

* Added op_name arguments

* Removed local settings

* Formatting

* Missing comma

* Formatting

* Formatting

* Formatting

* Formatting

* Formatting

* Missing space
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

48ffbfa5

09 Oct, 2020 1 commit

Add parallel stream analysis (#629) · 1d98fbb4

Paul Fultz II authored Oct 08, 2020

* Add intial multi stream analysis

* Formatting

* Add more tests

* Formatting

* Remove comment

* Analyze streams on the gpu

* Formatting

* Fix nstream

* Formatting

* Add test for return

* Formatting

* Make sure return has a stream assignment

* Formatting

* Fix asserts and checks

* Improve error message for out-of-order sequence

* Formatting

1d98fbb4

08 Oct, 2020 1 commit

Add build flag for fast math (#639) · a5065265

kahmed10 authored Oct 08, 2020



* add flag

* formatting

* remove env variable

* fix api expression

* add api test

* add api test

* add op test

* formatting

* fix function name

* fix syntax

* formatting

* modify test

* remove test and update doc

* move test to new file

* formatting

* revert test files

* rewrite check

* New
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

a5065265

30 Sep, 2020 1 commit

Add hip clang builds to jenkins (#651) · f28a62ea

Paul Fultz II authored Sep 30, 2020

* Make global variables const

* Tidy fixes

* Disable some lints

* Formatting

* Fix tidy const

* Formatting

* Add missing const keywords

* Formatting

* More fixes

* Fix remaining tidy issues

* Formatting

* Fix rocblas function call

* Formatting

* Fix nodiscard warnings

* Formatting

* Use named parameters

* Remove overload

* Add overload

* Remove noncps

* Use named param for node

* Add auto register header

* Use named parameters

* Refactor jenkinsfile

* Fix shadow

* Add missing body variable

* Add more const methods

* Add hip-clang docker builds

* Remove comments

* Add clang-format

* Add more const

* Formatting

* Rename stage

* Disable check

* Add another const

* Add python 2 dev packages

* Add sphinx to dockerfile

f28a62ea

31 Aug, 2020 1 commit

Pooling ceil mode (#615) · 9dabe26b

Shucai Xiao authored Aug 31, 2020



* support pooling ceil_mode

* clang format

* add unit test for pooling ceil mode

* clang format

* fix review comments

* clang format

* add more unit tests and fixed a bug in cpu pooling implementation

* clang format

* add one more unit test

* clang format

* fix cppcheck error

* fix cppcheck error

* fix cppcheck error

* fix review comments

* clang format

* remove the padding_mode attribute in pooling

* clang format

* clang format

* fix review comments

* clang format

* fix a cppcheck error

* fix review comments
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

9dabe26b

27 Aug, 2020 2 commits

Context serialization (#607) · 6e1f9f20

Shucai Xiao authored Aug 27, 2020



* Add initial serialization

* Formatting

* Add unit tests

* Formatting

* Add tests for serialization

* Formatting

* Use or not and

* Add value test

* Formatting

* Add more tests

* Add shape serialization

* Formatting

* Add serializtion for literal and argument

* Formatting

* Add from and to value to operatation

* Formatting

* Serialize empty types

* Formatting

* Tidy fixes

* Formatting

* Fix tidy issues

* Formatting

* Reformat value type macro

* Formatting

* Handle enum types

* Formatting

* Use const ref

* Update

* Add tests for to_value/from_value

* Formatting

* code backup

* clang format

* code backup

* clang format

* code backup

* clang format

* remove the from/to_value method for the generate context struct

* clang format

* code backup

* Dont print literal data in hip_copy_literal

* clang format

* add unit test to have better coverage

* remove unnecessary code

* remove unnecessary code

* fix review comments

* clang format

* fix review comments
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

6e1f9f20

Bool type and equal operator (#603) · 59b80d4e

Shucai Xiao authored Aug 27, 2020



* add bool type

* code backup

* code backup

* clang format

* fix build warnings

* clang format

* add the equal operator

* add the equal operator

* clang format

* remove unnecessary code

* refine unit tests

* clang format

* fix review comments and a bug

* clang format

* additional changes

* clang format

* fix cppcheck error

* add bool type in c api

* fix cppcheck error

* fix review comments

* fix cppcheck error

* fix a build error related to gcc

* fix cppcheck error

* fix cppcheck error

* added the equal operator to register list

* add parsing boolean type

* clang format

* fix bool type issue for python output

* clang format

* add support for automatic multibroadcast of the equal operator

* additional unit tests for more code coverage

* clang format

* missing an onnx file
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

59b80d4e

25 Aug, 2020 1 commit

Improve layernorm performance (#613) · 56b3bf58

Paul Fultz II authored Aug 25, 2020

* Use increment instead of division to compute register offset

* Formatting

* Limit layernorm to 1024 elements

* Formatting

* Add verification to driver

* Formatting

* Remove early return

* Use block_size 256

* Vectorize the kernel

* Formatting

* Convert to vector type

* Add layernorm tests

* Formatting

* Formatting

* Refactor layernorm to run both algos

* Formatting

* Fix compile error

* Fix tidy warnings

* Formatting

* Add layernorm function

* Formatting

56b3bf58

14 Aug, 2020 1 commit

Layernorm onnx support (#599) · 2c5d5fee

kahmed10 authored Aug 14, 2020



* fix pad calc

* bert tf passes correctness

* formatting

* add test

* formatting

* remove comment

* add inline

* formatting

* fix order for literal

* formatting

* test no mul_add

* formatting

* debug layernorm

* debug layernorm

* manual merge

* more progress

* formatting

* remove miopen batchnorm

* remove headers

* Fix compile error with no dpp reductions

* fix indices

* formatting

* change matcher

* formatting

* remove binds

* formatting

* disable tf matcher

* formatting

* use fast div

* formatting

* fix matcher

* formatting

* remove comment

* move find_matches

* add assert

* formatting

* fix deepcode issue
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

2c5d5fee

21 Jul, 2020 1 commit

Rewrite avepooling bug (#584) · 9169fbb3

Shucai Xiao authored Jul 21, 2020



* fix a bug in rewrite_pooling pass

* clang format

* add unit tests for rewrite_pooling

* clang format

* add rewrite pooling to support maxpooling

* clang format

* remove a redundant unit test

* add one more unit test
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

9169fbb3

16 Jul, 2020 1 commit

Nd deconv cpu (#565) · 98ade977

kahmed10 authored Jul 16, 2020



* initial progress

* formatting

* check existing tests

* formatting

* change for loop to transform

* formatting

* add tests

* formatting

* remove comment

* add more tests

* update gpu miopen calls

* formatting

* initial progress

* add cpu impl and tests

* formatting

* add NOLINT

* add 3d test

* formatting

* add more op_shape tests

* fix error msg

* fix bounds

* formatting

* fix algorithm

* formatting

* pin numpy version
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

98ade977