Commits · db63cc7720d35efb4aa2d7f655b95249f53fc2e1 · gaoqiong / MIGraphX

08 Jun, 2023 1 commit
- Add initial CK integration plus auto-tuning for kernels (#1791) · 25af8710
  Paul Fultz II authored Jun 08, 2023
```
Enable with MIGRAPHX_ENABLE_CK=1 and --exhaustive-tune tune flag
```
  25af8710
06 Jun, 2023 1 commit

Conditionally enable GeLU approximation (#1810) · c5d0c5b6

Umang Yadav authored Jun 05, 2023

Sigmoid approximation for GeLU was introduced in #1299 for Fp16. The sigmoid approximation is known to get better perf but lower accuracy. https://arxiv.org/pdf/1606.08415.pdf

c5d0c5b6

19 May, 2023 1 commit
- Enabling native int32 type support (#1721) · 8d9d5d1c
  Zhuoran Yin authored May 19, 2023
```
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
```
  8d9d5d1c
04 May, 2023 1 commit

[mlir] Adding quant convolution fusion as anchor op (#1683) · 7f105952

Zhuoran Yin authored May 03, 2023

Exposed the mlir_enabled() call the decide for lowering pipeline's enablement
Disabled the rewrite quantization pipeline in mlir compilation
Added quant convolution as anchor ops
Fixed the return type expectations
Added the fall back hip implementation for quantizelinear and dequantizelinear
Will need advises to improve the implementation for quantizelinear

7f105952

28 Apr, 2023 1 commit
- Removed split_single_dyn_dim compile flag (#1711) · bcc1f64a
  Charlie Lin authored Apr 28, 2023
  
  bcc1f64a
21 Apr, 2023 1 commit
- disable fusion only but create pointwise modules (#1706) · 2a44dfe9
  Umang Yadav authored Apr 21, 2023
  
  2a44dfe9
06 Apr, 2023 1 commit
- Add reduction fusion (#1614) · f201285c
  Paul Fultz II authored Apr 05, 2023
```
Automatically fuse multiple reductions and pointwise operations.
```
  f201285c
03 Apr, 2023 1 commit

promote_literals pass (#1593) · e3fb3a0d

Charlie Lin authored Apr 03, 2023

Adds the promote_literals compiler pass that moves literals from the submodules to the main module.
With the eliminate_common_subexpression pass, it will remove copies of literals created during split_single_dyn_dim.
Pass is enabled with the split_single_dyn_dim compile option.

e3fb3a0d

31 Mar, 2023 1 commit

Split single dynamic dimension compiler pass (#1580) · e9e3eacc

Charlie Lin authored Mar 30, 2023

Adds a new GPU compiler pass split_single_dyn_dim that handles when one input parameter has a single non-fixed dynamic_dimension.
commonly occurs for dynamic batch or BERT sequence length
Splits the dynamic shape into several submodules will static input parameters to handle all of the cases in the dynamic_dimension range.
Essentially does what I manually did for the select_module verify tests
Adds a compile option split_single_dyn_dim that toggles the pass on/off. Defaults to false.
Updates verify_program.hpp and run_verify.cpp to allow for the tests to change the compile_options

e9e3eacc

18 Mar, 2023 1 commit
- Dynamically plug-in backend target libs (#1608) · 7a7040aa
  Umang Yadav authored Mar 18, 2023
```
Fixes #1595
```
  7a7040aa
16 Feb, 2023 1 commit

Add flag for tuning in migraphx-driver (#1519) · cc098f4d

Umang Yadav authored Feb 15, 2023

* Add driver flag "--exhaustive-tune" to enable tuning, add support for the same in C/C++ and python API

cc098f4d

31 Jan, 2023 1 commit
- Add a general optimize pass (#1491) · a4b82653
  Paul Fultz II authored Jan 30, 2023
```
* Add general optimize pass
* Fuse gemm multiplies by scalar
* Handle zero epsilon
```
  a4b82653
29 Nov, 2022 1 commit

remove extra adjust allocation pass (#1477) · 5a2a83a4

kahmed10 authored Nov 30, 2022

Merging #1391 caused an extra adjust allocation pass for GPU targets. This removes that merge error.

5a2a83a4

02 Nov, 2022 1 commit
- Add nhwc layout to gpu backend (#1391) · 1820198e
  Paul Fultz II authored Nov 02, 2022
```
Can be enabled via environment variable MIGRAPHX_ENABLE_NHWC
```
  1820198e
26 Oct, 2022 1 commit
- rearrange default pass list; adjust_allocation must be run after rep… (#1418) · 7b9ce460
  Brian Pickrell authored Oct 26, 2022
```
Fixes an observed regression error on certain Frozen Protobuf models due to PR 1280
```
  7b9ce460
13 Oct, 2022 1 commit

Rewrite TF batch norm; remove batch_norm_inference (#1371) · be309bfb

Charlie Lin authored Oct 13, 2022

Rewrites the TF batch norm like operators to other MIGX operators
Removes the code related to batch_norm_inference

be309bfb

31 Aug, 2022 1 commit

Add pass to rewrite gelu as fast gelu (#1299) · 794a4335

turneram authored Aug 31, 2022

Rewrite_gelu pass replaces the gelu formula of x * (1/2) * (1 + erf(x/sqrt(2))) with the sigmoid approximation of x * Sigmoid(x * 1.702)

794a4335

27 Aug, 2022 1 commit

Improvements to handling and add constant passed to dot operator (#1280) · 8752875a

Paul Fultz II authored Aug 26, 2022

This will rewrite dot operators like X(Y + b) to XY + Xb when b is constant as we can fold the add away.
This improves handling pointwise with broadcasted operators, this helps improves const propagation.
Improve gemm fusion with a mul_add
Improve support for broadcast shapes in gemm

8752875a

12 Jul, 2022 1 commit
- Use current device when constructng context (#1294) · 68189043
  Paul Fultz II authored Jul 11, 2022
  
  68189043
03 Jul, 2022 1 commit

Add mlir fusion (#1251) · ca8a54fe

Paul Fultz II authored Jul 03, 2022

* Add mlir c api

* Formatting

* Create a type attribute

* Formatting

* Parse module

* Formatting

* Add mlir dump function

* Add test case

* Formatting

* Fix tidy issues

* Update mlit version

* Update to newer mlir

* Format

* Move mlir to the gpu and update the test

* Formatting

* Fix bug when appending module

* Format

* Remove old cmake flag

* Update message

* Add return

* Format

* Add mlir_compile

* Format

* Register dialect

* Handle unsinged integers

* Dont provide output for return instruction

* Format

* Add code to insert memrefs

* Format

* Add mlir verification

* Formatting

* Enable pointwise_fusion

* Disable eliminate_data_type

* Set kernal name

* Format

* Fix device name

* Formatting

* Fix output arg

* Format

* Updates

* Upate hash

* Add fuse_mlir pass

* Format

* Add fuse mlir

* Format

* Update mlir

* Sort parameter names

* Format

* Reenable disabled passes

* Remove old mlir conv

* Remove asym default padding

* Add more verbose tracing

* Format

* Fix compilation errors

* Format

* Whitelist operators

* Format

* Add namespace

* Format

* Update triple

* Format

* Use func dialect

* Format

* Use func.return

* Format

* Upgrade mlir version

* Add comment

* Handle symetrical padding

* Format

* Cleanup debug output

* Format

* List failed tests

* Move mlir compile to jit pipeline

* Format

* Update version

* Add source locations

* Format

* Correctly add module

* Format

* Update failed tests

* Fix failures when mlir is disabled

* Format

* Update mlir version

* Check type for fp32

* Format

* Remove failed test

* Update mlir in driver

* Tidy fixes

* Foramt

* Tidy fixes

* Format

* Fix const

* Remove from requirements

* Fix cmake version

* Fix tidy warning

* Use another ifdef

* Fix tidy

* Other tidy fix

* Format

* Update hash

* Add missing license files

* Format

* Format

* Fix fnction name

ca8a54fe

23 Jun, 2022 1 commit
- remove eliminate_workspace pass (#1254) · f5760e21
  kahmed10 authored Jun 23, 2022
```
* remove eliminate workspace
* remove sync device and other tags
```
  f5760e21
22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
17 Jun, 2022 1 commit

Create allocate op and replace_allocate pass (#1183) · add6fb3b

kahmed10 authored Jun 17, 2022



* add allocate op header

* formatting

* add replace_allocate pass

* formatting

* move output param to remove_allocate pass

* formatting

* fix bugs in replace_allocate pass

* formatting

* fix verify if tests

* formatting

* move if op logic

* formatting

* cleanup lowering

* cleanup lowering

* formatting

* fix tidy

* formatting

* fix tidy

* add cpu allocate check

* formatting

* change cpu allocate in pass

* formatting

* add some tests for replace_allocate pass

* formatting

* pass by ref

* fix run_pass

* formatting

* update variable name for module

* update dce to use contains() and fix tidy

* formatting

* update cppcheck

* add if test

* formatting

* add if test

* rename var to mod_output_names

* formatting

* remove conditional

* update allocate op and tests

* formatting

* update replace_allocate tests

* update create_output_names() and conditional in replace_allocate

* formatting

* remove extra variable in replace_allocate

* update tools script for allocation_model
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

add6fb3b

11 May, 2022 1 commit

Prefuse layernorm for gpu (#1190) · 671f24be

Paul Fultz II authored May 11, 2022

Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster

671f24be

09 Feb, 2022 1 commit
- Enable pointwise fusion by default (#1082) · c7419a9c
  Paul Fultz II authored Feb 09, 2022
```
There is now a MIGRAPHX_DISABLE_POINTWISE_FUSION to disable it
```
  c7419a9c
11 Nov, 2021 1 commit

Conditionally enable pointwise fusion (#992) · 157935ff

Paul Fultz II authored Nov 10, 2021

This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.

This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.

157935ff

08 Oct, 2021 1 commit

Remove alpha and beta from `dot` and `quant_dot` (#961) · 21193e87

Umang Yadav authored Oct 08, 2021

Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.

Aim is to have the definition of dot operator as C = A . B without having alpha or beta.

In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.

21193e87

17 Sep, 2021 2 commits

Revert "Remove alpha and beta attributes from dot operator (#945)" (#957) · 985f58b0
Paul Fultz II authored Sep 17, 2021
```
This reverts commit 9e43cb8b.
```
985f58b0

Remove alpha and beta attributes from dot operator (#945) · 9e43cb8b

Umang Yadav authored Sep 17, 2021

This PR aims to remove alpha and beta attributes from dot operator completely.

Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.

Aim is to have the definition of dot operator as C = A . B without having alpha or beta.

9e43cb8b

10 Sep, 2021 1 commit
- Assert the shape for compute and compute_shape are the same (#936) · 8b4c69c5
  Paul Fultz II authored Sep 10, 2021
```
Assert shapes dont change
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
```
  8b4c69c5
18 Aug, 2021 1 commit

Optimize Q/DQ Format Pass (#889) · 0b5f33b6

turneram authored Aug 18, 2021

* Add operators, refactor parsers, add rewrite passes, add tests

* Add ref implementations

* Move broadcasting of scales and zero points to onnx parser

* Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type

* Switch certain variables to int64_t

* Fix overflow in implicit constant conversion

* Remove operators.hpp from includes in tf_test.cpp

* Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes

* Switch dequantizelinear math from int32 to float

* Remove changes to operators.hpp

* Simplify apply_quantizelinear

* Add verify test for int32 data

* Add rewrite_quantization back to CMakeLists

* Add passes to insert qdq after add_bias is applied, replace quant_ops, and remove remaining qdq pairs

* Renaming, refactoring, cleaning up code, adding formal test, and adding passes to targets

* Renaming, review comments, begin adding more specific tests

* Add more specific unit tests

* Fix failing test on CI

* Correct matcher and update qop rewriting, update tests and add more tests

* Update matcher, clean up simplify_qdq, tweak tests

* Add tests, remove pass from CPU target, update dot parameters, clean up simplify_qdq

* Fix correctness bug in ref q/dq implementations; edit gemm parser to make beta always 0.0

* Remove unused variables in onnx gemm tests

0b5f33b6

15 Jul, 2021 1 commit

Quantize linear ops (#843) · 3282e01a

turneram authored Jul 15, 2021

* Add operators, refactor parsers, add rewrite passes, add tests

* Formatting

* Fix cppcheck

* Review comments

* Formatting

* Combine rewrite passes

* Formatting

* Add ref implementations

* Formatting

* Review comments

* Formatting

* Tidy warnings

* Apply review comments

* Formatting

* Fix CI error

* Formatting

* Increase code coverage

* Formatting

* Move broadcasting of scales and zero points to onnx parser

* Formatting

* Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type

* Formatting

* Increase code coverage

* Formatting

* Switch certain variables to int64_t

* Formatting

* Fix overflow in implicit constant conversion

* Formatting

* Increase code coverage

* Formatting

* Remove operators.hpp from includes in tf_test.cpp

* Formatting

* Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes

* Formatting

* Switch dequantizelinear math from int32 to float

* Formatting

* Remove changes to operators.hpp

* Simplify apply_quantizelinear

* Formatting

* Add verify test for int32 data

* Add rewrite_quantization back to CMakeLists

3282e01a

08 Jul, 2021 1 commit

Preallocate parameters on the CPU and unify preallocations (#840) · 427fc25c

Paul Fultz II authored Jul 08, 2021



* Add preallocate method

* Add preallocate_param pass

* Preallocate buffers on the cpu

* Formatting

* Preallocate on the gpu

* Add missing cpp file

* Formatting

* Add lifetime function

* Formatting

* Always allocate

* Fix tidy warning

* Add const

* Add missing lifetime annotations
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

427fc25c

27 Jun, 2021 1 commit

Inline subgraph (#802) · bc52a8a8

Shucai Xiao authored Jun 27, 2021



* Add definitions for all pointwise operators

* Formatting

* Add cpp generator class

* Formatting

* Move compilation to core

* Formatting

* Add clock to tmp name

* Add dynamic loader

* Formatting

* Add tests for code gen

* Formatting

* Add test for literals

* Formatting

* Use with_char

* Add missing header

* Fix mismerge

* Ignore tidy warning

* Fxx gcc 5 errors

* Apply fixits

* Skip signed bitwise of status

* Remove unused parameters

* Explicitly add c++14 flag

* Fix tidy warning

* unify the compute function signature

* clang format

* make another change

* unify the compute function

* clang format

* remove unnecessary code

* more refinement about the operator compute funciton

* clang format

* add an overload function

* clang format

* add support for axes inputs for sequeeze/unsqueeze/reduce_sum

* clang format

* fix build problems

* backup code changes

* clang format

* Add tuple type to shape class

* Formatting

* fix a bug in parsing quantizelinear operator

* clang format

* fix a cppcheck error

* disable different versions of unit tests for different onnx version

* clang format

* upgrade onnx to 1.8

* update onnx to 1.8.1

* disable two more real models

* clang format

* Make data member private

* Formatting

* Add sub arguments

* Formatting

* Trun clang format off

* Disable clang-format

* fix review comments

* fix the function of assign axes in parsing the squeeze operator

* add unit tests and fix a bug

* clang format

* fix review comments

* clang format

* fix a build error

* backup code changes

* clang format

* add more unit tests and add parsing opset version

* clang format

* Improve visiting tuples

* Formatting

* fix cppcheck error

* adding installing the onnx package

* resolve no protobuf compiler

* add an inline subgraph pass

* clang format

* Add more argument tests

* Formatting

* Handle tuple in load

* Formatting

* code backup

* clang format

* Remove .o files

* Add tuple type to api

* Formatting

* fix build errors

* clang format

* code backup

* code backup

* add unit tests for the inline subgraph

* clang format

* refine the inline subgraph and parse if operator

* clang format

* fix cppcheck issue

* clang format

* add unit test for inline subgraph pass

* clang format

* fix format issue

* remove the context from the if operator

* clang format

* simplify the compute functions

* Fix tidy warnings

* fix cppcheck error

* clang format

* fix cppcheck error

* Fix tidy warnings

* fix a cppcheck error

* clang format

* Add a test for share method

* Formatting

* Add a test cpp_type

* add unit tests for more code coverage

* clang format

* add unit tests to have more code coverage

* clang format

* try a comment in jenkins build

* include the install onnnx line

* code backup

* reorder the dependenciesd installed

* refine dockerfile

* fix review comments

* clang format

* remove unnecessary overload function

* fix cppcheck error

* change back the argument test

* Suppress tidy warning

* add the operator get_tuple_elem

* clang format

* add get_tuple_elem to operator include file

* chang if to support multiple operation outputs

* clang format

* optimize inline subgraph

* clang format

* code backup

* clang format

* fix bug

* refine unit tests for tuple output of the if operator

* clang format

* refine a instruction replacement code

* add a unit test and sort all the unit tests alphabetically

* fix cppcheck error

* add more unit tests for multiple op outputs

* clang format

* fix cppcheck error

* Update pass manager to get modules after every pass

* more unit test to cover more scenarios

* clang format

* fixed a bug in a unit test

* add more tests

* clang format

* add more unit tests to have more code coverage

* fix a bug in a unit test

* Add program overload for module

* Formatting

* Hash modules for quicker lookup of modules

* Bump file version

* Add methods to remove modules

* Formatting

* add the tuple type to the support list

* Eliminate unused modules

* Formatting

* Fix test errors

* Foramtting

* Fix tidy issues

* fix problem related to inline subgraph

* clang format

* fix review comments

* fix review comments

* fix review comments

* fix review comments

* clang format

* fix a unit test

* one more code change

* remove an optimization related to the if operator

* clang format

* fix review comments
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

bc52a8a8

09 Jun, 2021 1 commit

Asym pad refactor (#791) · 9a5e0c06

kahmed10 authored Jun 09, 2021



* alternative impl

* formatting

* add gpu pass to insert pad

* formatting

* update onnx test, still need cleanup

* formatting

* update tf_test

* modify existing tests

* formatting

* remove print

* code cleanup

* formatting

* code cleanup

* formatting

* fix tidy and cppcheck

* remove variable

* add test

* formatting

* add test and address comments

* formatting
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

9a5e0c06

03 May, 2021 1 commit

Reduce types generated for hip kernels (#814) · 3becd974

Paul Fultz II authored May 03, 2021



* Remove unused data types

* Formatting

* Reduce types generated for hip kernels

* Formatting

* Fix onnx tests

* Formatting
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

3becd974

29 Apr, 2021 1 commit

MLIR MIOpen Dialect integration (phase 1) (#768) (#769) · 56584fa2

SJW authored Apr 29, 2021



* MLIR MIOpen Dialect integration (phase 1) (#768)

* Added Findmlir.cmake (using environment variables to import)

* Added mlir_conv pass to GPU target

  * Apply to any gpu::convolution if supported by MLIR

  * Call MLIR C-API to generate iGEMM kernel with configuration from gpu::convolution

  * Capture binary in dictionary for matching convolutions

  * Build a code_object_op with the binary and execution dimensions

  * Substitute for the gpu::convolution

* Changed the parameters for the code_object to reflect the generated MLIR kernel

* Expanded out MemRefDescriptor fields in param list

* Also updated for MLIR C-API changes

* * fixed global_size calculation

* MLIR MIOpen Dialect integration (phase 1) (#768)

* Added Findmlir.cmake (using environment variables to import)

* Added mlir_conv pass to GPU target

  * Apply to any gpu::convolution if supported by MLIR

  * Call MLIR C-API to generate iGEMM kernel with configuration from gpu::convolution

  * Capture binary in dictionary for matching convolutions

  * Build a code_object_op with the binary and execution dimensions

  * Substitute for the gpu::convolution

* Changed the parameters for the code_object to reflect the generated MLIR kernel

* Expanded out MemRefDescriptor fields in param list

* Also updated for MLIR C-API changes

* * Added command line option: --enable_mlir

* * fixed command line switch

* updated for new MLIR API changes

* * Added cget llvm-project-mlir to import MIIR API libraries into Dockerfile
  * removed cmake Findmlir

* updated for changes in MIIR C-API

* * updated CMakeLists.txt to allow disable of MLIR import

* fixed memory leaks and removed copies

* updated for 5D memrefs

* * formatting

* * fixed review comments

* * fixed merge issues

* hip gcnDeviceName now includes specifiers at the end
  * use major/minor values instead

* * disable MLIR by default

* * removed command-line switch --enable-mlir

* * fix unused when MLIR disabled

* * enable jenkins enable/test MLIR

* * format

* * fixed clang-tidy

* * added new type
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

56584fa2

26 Feb, 2021 1 commit

Add more supported operators and optimizations for the cpu backend (#746) · a0b570b2

Paul Fultz II authored Feb 26, 2021



* Add eliminate_data_type pass

* Formatting

* Auto convert quant ops

* Formatting

* Flip the order of decompose

* Compute max size differently

* Formatting

* Clamp values in convert

* Formatting

* Fix loss of precision in reduce

* Formatting

* Fix bugs in reduction

* Fix accumulator type in reference softmax implementation

* Formatting

* Update convert test

* Remove unused variables

* Remove unnecessary quant_dot check

* Formatting

* Add tests

* Formatting

* Remove unused code

* Remove duplicate ops

* Remove blaze dependency

* Use set since shape::type_t is no hashable on gcc 5

* Formatting

* Add dnnl binary op

* Formatting

* Add binary and eltwise

* Formatting

* Add softmax

* Formatting

* Remove unused operators

* Add missing files

* Formatting

* Add lrn

* Formatting

* Add deconvolution

* Formatting

* Change allocate default

* Add reorder

* Formatting

* Add reductions

* Formatting

* Sort lines

* Change literals in another loop

* Add pow operator

* Formatting

* Add pow operator

* Formatting

* Make sure shapes are packed

* Allow broadcasted inputs

* Remove unused operators

* Simplify functions

* Remove softmax

* Add sub and erf functions

* Formatting

* Fix bug

* Formatting

* Improve parallism

* Formatting

* Allow multiple batch dimensions

* Formatting

* Move literal transforms out of lowering

* Formatting

* Add gather operator

* Sort lines

* Add early exit for carry

* Formatting

* Add missing concat

* Rename macro

* Fix deep nesting

* Formatting

* Fix cppcheck issues

* Remov else

* Move attribute to typedef

* Formatting

* Disable maybe-uninitialized warning since its broken on gcc

* Add constexpr default constructor

* Formatting

* Fix compiler warnings

* Fix adjust_allocation test
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

a0b570b2

05 Feb, 2021 1 commit

Normalize compute methods (#723) · 549cfe72

Paul Fultz II authored Feb 05, 2021



* Normalize compute functions

* Formatting

* Save normalization flag to the file

* Formatting

* Remove tuned functions

* Formatting

* Use in_index
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

549cfe72

14 Dec, 2020 1 commit

Use dnnl for cpu backend (#688) · 406afeb8

Paul Fultz II authored Dec 14, 2020



* Add flag to enable cpu backend

* Make buffers shared

* Enable optimizations

* Add onednn

* Formatting

* Formatting

* Add dnnl header

* Formatting

* Rewrite rnn first

* Formatting

* Call reference implementation

* Formatting

* Make literal data shared

* Formatting

* Add convolution

* Formatting

* Compensate for dilation

* Formatting

* Use name/make_op instead

* Formatting

* Rename gemm header

* Formatting

* Add dnnl convolution/gemm operators

* Formatting

* Add eliminate_contiguous

* Add faster pointwise operators

* Formatting

* Formatting

* Formatting

* Add dnnl op class

* Formatting

* Add add op

* Formatting

* Add concat operator

* Formatting

* Add more ops

* Create descriptor during finalization

* Formatting

* Dont rewrite pooling

* Enable memory coloring

* Formatting

* Add output aliases

* Formatting

* Fix errors

* Formatting

* Convert literals

* Add missing file

* Remove batch_norm

* Formatting

* Use strides

* Formatting

* Add some debug checks

* Formatting

* Fix big in adjusting shape for gemm

* Formatting

* Fix fallback dot operator

* Zero initialize buffers

* Add suport for group convolutions

* Formatting

* Make adjust allocation target independent

* Formatting

* Enable adjust_allocation for gpu/cpu

* Formatting

* Add copy to allocation model

* Formatting

* Add copy operator

* Formatting

* Better handling of output parameters in adjust_allocation

* Formatting

* Build with dnnl

* Make dnnl required

* Fix compile error

* Tidy fixes

* Formatting

* Tidy fixes

* Formatting

* Fix more tidy issues

* Formatting

* Add mul op

* Add mul op

* Set c compiler to clang as well

* Compensate for normalized compute shape

* Formatting

* Fix cppcheck errors

* Formatting

* Add onednn library to hcc

* Guard clang pragmas

* Disable cpu mode for gcc for now

* Leave it enabled it for gcc 7

* Fix cppcheck suppresion

* Fix compile error on gcc 5

* Remove unused code
Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

406afeb8