Commits · e5a33aadba97664db3437a7033d691cde04b53dd · gaoqiong / MIGraphX

19 May, 2023 1 commit
- Enabling native int32 type support (#1721) · 8d9d5d1c
  Zhuoran Yin authored May 19, 2023
```
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
```
  8d9d5d1c
24 Apr, 2023 1 commit

Dynamic shape hip::copy_to_gpu and hip::copy_from_gpu (#1694) · 84acaea0

Charlie Lin authored Apr 24, 2023

Updates the hip::copy_to_gpu and hip::copy_from_gpu operators to work with dynamic shapes

Allows for offload_copy to be used with dynamic batch

Changed assert in select_module because the argument might now be smaller with how offload_copy will work with dynamic batch. (maximum buffer size will be used)

84acaea0

16 Feb, 2023 1 commit
- Remove HCC (#1546) · bfd77388
  Umang Yadav authored Feb 16, 2023
```
* deprecate HCC
```
  bfd77388
31 Jan, 2023 1 commit

hipRTC fixes (#1531) · 91cc7242

Umang Yadav authored Jan 31, 2023

Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC.
Added stages in Jenkins for hipRTC.
Fixes for some of the pending issues from hipRTC.

91cc7242

23 Sep, 2022 1 commit
- Remove unused device functions (#1394) · 8ea8473d
  Paul Fultz II authored Sep 23, 2022
```
* Remove device functions
* Update tests
```
  8ea8473d
06 Sep, 2022 1 commit
- Enable cppcheck rule for 'not', 'or' keywords (#1361) · d37a4df9
  Paul Fultz II authored Sep 06, 2022
```
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
```
  d37a4df9
22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
11 Apr, 2022 1 commit

fix a bug in create tensor_view with vec data type (#1155) · 3c301efa

Shucai Xiao authored Apr 11, 2022

When create a tensor_view with vector date type, the last dimension of the shape should be divided by the vec_size.

3c301efa

18 Mar, 2022 1 commit

Complete GPU implementation of CumSum op (#1094) · 548783c8

turneram authored Mar 18, 2022

Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum

548783c8

14 Mar, 2022 1 commit
- Increase max groups in kernel (#1120) · d353641d
  Shucai Xiao authored Mar 14, 2022
```
change max number of groups in a kernel to 1B for greater performance
```
  d353641d
02 Mar, 2022 1 commit
- Clang format ver10 (#1106) · 9852aaef
  bpickrel authored Mar 02, 2022
```
Update the base version of clang-format from 5.0 to 10.0
```
  9852aaef
09 Dec, 2021 1 commit

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

08 Oct, 2021 1 commit

Nonzero op extension (#870) · 0879b5f1

Shucai Xiao authored Oct 08, 2021

This PR is for the nonzero operator with static output shape.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0879b5f1

01 Oct, 2021 1 commit

Add multinomial op (#954) · 0b7672d7

turneram authored Oct 01, 2021

Add multinomial op to onnx parser with ref and GPU implementations.

The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution.

Resolves #821
Co-authored-by: Shucai Xiao <shucai@gmail.com>

0b7672d7

27 Sep, 2021 1 commit

Dpp opts for wavefront 32 (#951) · 6e2df9de

kahmed10 authored Sep 27, 2021

Checks wavefront size, then changes implementation and number of threads for DPP reduce

6e2df9de

16 Sep, 2021 1 commit

Loop operator (#853) · a275f590

Shucai Xiao authored Sep 16, 2021

Add Loop operator for opset version 13.
Notes: 1) Default max iteration number is 10 if no max iteration number is provided
2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model.
3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later.
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

a275f590

02 Sep, 2021 2 commits

Refactor where op (#918) · ebbaf8fc

turneram authored Sep 02, 2021

Implement the Where operator for the CPU and GPU.  This is for better performance.

ebbaf8fc

Topk op (#877) · 521b57a2

Shucai Xiao authored Sep 01, 2021



* add topk operator doe ref, cpu and gpu
* Hash modules for quicker lookup of modules
* add onnx unit test
* add unit tests for the topk operator
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

521b57a2

09 Aug, 2021 1 commit
- check for divisor encodable or not, fallback if needed (#906) · a8d86615
  Cagri Eryilmaz authored Aug 09, 2021
```
* check for divisor encodable or not, fallback if needed

* verify test for retinaface case
```
  a8d86615
09 Jul, 2021 1 commit
- fix review comments · 4e861c7c
  Shucai Xiao authored Jul 08, 2021
  
  4e861c7c
08 Jul, 2021 3 commits

Add inclusive scan on the GPU (#872) · 6ba279cc

Paul Fultz II authored Jul 08, 2021



* Add initial scan operator

* Formatting

* Fix with a working test

* Fix bugs

* Formatting

* Formatting

* Simplify

* Formatting

* Use non-power of 2 for test

* Make pointer
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

6ba279cc

clang format · 4a24a2dd
Shucai Xiao authored Jul 08, 2021

4a24a2dd
fix review comments · 0281a411
Shucai Xiao authored Jul 08, 2021

0281a411

25 Jun, 2021 4 commits
- clang format · 27c0ae08
  Shucai Xiao authored Jun 25, 2021
  
  27c0ae08
- refine unit tests · dd651742
  Shucai Xiao authored Jun 25, 2021
  
  dd651742
- clang format · 7135da72
  Shucai Xiao authored Jun 25, 2021
  
  7135da72
- gpu implementation of the scatter operator · 973cafd4
  Shucai Xiao authored Jun 25, 2021
  
  973cafd4
11 Jun, 2021 1 commit
- Disable dpp for wavefront 32 (#857) · 764b1e44
  Paul Fultz II authored Jun 11, 2021
```
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>
```
  764b1e44
08 Jun, 2021 1 commit

Reverse Op (#846) · 9c54fc4f

Cagri Eryilmaz authored Jun 08, 2021



* init reverseOp branch: ref op + ref test. WIP

* first passing basic test

* cleanup

* additional axis implementation

* additional test

* ref op implementation vec to int for axis

* ref op test change for axis

* initial gpu files and test

* updates to implementation and test

* fixed some issues

* clang format

* cleanup

* formatting

* removing comments

* remove local size, back to default

* update tests: replace with std functions

* multiple axis for reverse op

* fix a build error

* clang format

* more tests

* fix a bug for the reverse device function

* clang format

* fix a bug

* clang format

* ref test updates, multiaxis

* formatting
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

9c54fc4f

03 May, 2021 1 commit

Reduce types generated for hip kernels (#814) · 3becd974

Paul Fultz II authored May 03, 2021



* Remove unused data types

* Formatting

* Reduce types generated for hip kernels

* Formatting

* Fix onnx tests

* Formatting
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

3becd974

05 Mar, 2021 1 commit

Infer outputs in tf (#764) · 197b3a46

kahmed10 authored Mar 05, 2021



* fix relu6

* add more transposes

* add multi output

* formatting

* add tests

* formatting

* fix tests

* change to_nchw for outputs

* add python api

* fix cppcheck

* remove variable

* fix lambda

* add multi_output test

* add more tests and merge

* fix help message

* debugging work

* fix valid op string

* formatting

* manual merge

* mark function as const
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>
Co-authored-by: Shucai Xiao <shucai@gmail.com>

197b3a46

26 Feb, 2021 1 commit

changes for not operator (#735) · ebf8bd20

Cagri Eryilmaz authored Feb 26, 2021



* changes for not operator

* changed name of the op from unary_not to not

* Added tests for op and onnx parsing

* reordering not_test in onnx_test.cpp

* not operator -- gpu implementation

* added bool test for not operator

* Added test and missing links for not operator on GPU

* typo fix

* adding .onnx test files for not operator

* formatting
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

ebf8bd20

08 Feb, 2021 1 commit

Add a pass to remove unsupported data types (#738) · 3d24a21c

Paul Fultz II authored Feb 07, 2021



* Add eliminate_data_type pass

* Formatting

* Auto convert quant ops

* Formatting

* Flip the order of decompose

* Compute max size differently

* Formatting

* Clamp values in convert

* Formatting

* Fix loss of precision in reduce

* Formatting

* Fix bugs in reduction

* Fix accumulator type in reference softmax implementation

* Formatting

* Update convert test

* Remove unused variables

* Remove unnecessary quant_dot check

* Formatting

* Add tests

* Formatting

* Remove unused code

* Remove duplicate ops

* Remove blaze dependency

* Use set since shape::type_t is no hashable on gcc 5

* Formatting
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

3d24a21c

19 Jan, 2021 1 commit

Logical ops (#718) · 4d46cbdb

Shucai Xiao authored Jan 19, 2021

* add the and operator

* clang format

* add unit tests for the and operator

* clang format

* change the and name to logical_and and add the logical_or, logical_xor

* clang format

* add onnx unit tests for or and xor

* add more unit tests

4d46cbdb

08 Jan, 2021 1 commit

Revamp CI infrastucture (#706) · ceb4ca09

Paul Fultz II authored Jan 08, 2021



* Add build and test github workflow

* Fix cget command

* Remove def-requirements.txt

* Add tmate session to debug workflow

* Run tmate session after installing dependencies

* Print date periodically

* Add clang tidy action

* Seperate build and run container in two different jobs

* Run bash script

* Remove interactive flag

* Try to mount the files

* Try to use the github workspace

* WIthout double braces

* Use env variable

* Pipe bash script in

* Run using hip-clang

* Use correct path

* Add verbose

* Remove j flag

* Only run for onnx file to debug

* Manually run clang-tidy

* Remove quiet flag

* Print header file

* Printout environment

* Remove extra defines

* Remove fixits and config flag

* Show ldd

* Add tmate session

* Run onnx protobuf first

* Generate proto for tensorflow

* Update cppcheck version

* Fix some cppcheck issues

* Add const

* Cppcheck fixes

* Formatting

* Fix more cppcheck issues

* Run two jobs

* Cache analysis and run format checking

* Fix yaml issues

* Fix yaml issues

* Fix indentation

* Switch to hip-clang for main docker file

* Use hip-clang in the readme

* Fixes for jenkins

* Use ccache to build

* Combine file

* Set restore keys

* Change stage name

* Build with ccache

* Add missing dependency for ccache

* Build debug with codecov

* Fix workflow syntax

* Fix list

* Use quotes

* Got to correct build path

* Install lcov

* Use sudo

* Echo all commands

* Setup tmate

* Add verbose output

* Build with cmake directly

* Add pthread flag

* Remove python config

* Continue on error

* Use on or off for cmake flag

* Use always upload cache

* Verbose output

* Verbose output from build

* Build one target

* Reduce debug symbols

* Increase garbage collection

* Remove dmesg

* Increase it to 20

* Update rocm cmake version

* Remove jobs from jenkins

* Run on all 3 ubuntus

* Remove gcc 5 jobs

* Dont add flag on 16.04

* Only upload coverage on 18.04

* Dont build for ubuntu 20.04

* Use matrix.os

* Use O2 for hip-clang since lower optimizations are broken

* Use rocm 3.0

* Pass ccache as cmake variable instead of env variable

* Build miopen from source

* Show ccache statistics

* Print log information

* Set compression level

* Use hash dir

* Set hashdir

* Install clang ocl from system

* Up compression level

* Add locale

* Increase cache size to 1G

* Lower compression level to 9

* Remove split dwarf

* Remove Og

* Add back Og

* Seperate debug and codecov

* Add missing backlash

* Garbage collect more often

* Add missing locales package

* Use Os

* Install onednn in docker and run tests

* Include target headers in tests

* Increase timeout

* Remove if condtion

* Make flag public

* Suppress memory leaks in onednn

* Use equal

* Add gh annotations

* Update rocm-cmake version

* Add ldconfig
Co-authored-by: Shucai Xiao <shucai@gmail.com>

ceb4ca09

20 Nov, 2020 1 commit

Fuse skip layernorm (#683) · 1bfb147d

Paul Fultz II authored Nov 20, 2020



* Unify the vectorized and non-vectorized path

* Formatting

* Make fusion easily extendable

* Add skip layernorm fusion

* Formatting

* Call correct layernorm function

* Fix compile errors

* Add DCE

* Add test for skip layernorm

* Formatting

* Remove unused typedef

* Formatting

* Fix tidy issues

* Formatting
Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>

1bfb147d

16 Nov, 2020 1 commit

Normalize ops (#667) · 8443ecd1

Shucai Xiao authored Nov 16, 2020



* add a pass to normalize ops

* clang format

* add unit tests

* clang format

* code backup

* clang format

* code backup

* clang format

* add support for slice in the normalize_op function

* clang format

* add operation method api for whether we need to call normalize_op

* clang format

* fix review comments

* clang format

* rename a function namejJ

* clang format

* change compute_shape to normalize_compute_shape for corresponding operators

* clang format

* remove unnecessary code

* fix various issues

* clang format

* add attributes to operators having axis attributes

* clang format

* fixed jenkins build error

* clang format

* fix a bug related to slice

* clang format

* code backup

* clang format

* code backup

* clang format

* rename a file

* fix cppcheck error

* some code refinement

* clang format

* change attributes to enum

* clang format

* refine the enum

* clang format

* remove unnecessary code

* add unit tests for more code coverage and fixed a bug

* clang format

* remove unnecessary changes

* change normalize_axes to normalize

* clang format

* revert back the changes in broadcast.hpp

* rename normalize_axes to normalize

* fix review comments

* clang format

* Add flag to enable cpu backend

* Make buffers shared

* Enable optimizations

* Formatting

* Try to avoid ambiguous assign in value class

* fixed a build error

* clang format

* add the normalize_ops pass to the ref target

* refactor program to module to normalize_ops pass
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

8443ecd1

15 Oct, 2020 1 commit

Added greater and less operators (#660) · 48ffbfa5

turneram authored Oct 15, 2020



* Added greater and less operators

* Fixed ops_test.cpp

* Set commutative to false for less, greater

* Refactored parse_equal/less/greater into parse_compare_op

* Removed unnecessary function attributes() from greater.hpp/less.hpp

* Added op_name arguments

* Removed local settings

* Formatting

* Missing comma

* Formatting

* Formatting

* Formatting

* Formatting

* Formatting

* Missing space
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

48ffbfa5

30 Sep, 2020 1 commit

Add hip clang builds to jenkins (#651) · f28a62ea

Paul Fultz II authored Sep 30, 2020

* Make global variables const

* Tidy fixes

* Disable some lints

* Formatting

* Fix tidy const

* Formatting

* Add missing const keywords

* Formatting

* More fixes

* Fix remaining tidy issues

* Formatting

* Fix rocblas function call

* Formatting

* Fix nodiscard warnings

* Formatting

* Use named parameters

* Remove overload

* Add overload

* Remove noncps

* Use named param for node

* Add auto register header

* Use named parameters

* Refactor jenkinsfile

* Fix shadow

* Add missing body variable

* Add more const methods

* Add hip-clang docker builds

* Remove comments

* Add clang-format

* Add more const

* Formatting

* Rename stage

* Disable check

* Add another const

* Add python 2 dev packages

* Add sphinx to dockerfile

f28a62ea

31 Aug, 2020 1 commit

Instance norm kdims (#620) · 69925294

kahmed10 authored Aug 31, 2020



* fix parsing to kdims

* add 5d size

* fix assert

* add 3d test

* formatting
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

69925294