Commits · 16e5b5d086183b37cc37d9f6ed9381b74f2bbccb · gaoqiong / MIGraphX

"docs/source/CommunitySharings/perf_compare.rst" did not exist on "8fb8f8b3cef0757401c1a7ed0fb1c8e3f659c5c4"

28 Feb, 2022 8 commits
- clang format · 5f37917f
  Shucai Xiao authored Feb 28, 2022
  
  5f37917f
- backup additional changes · f50bcff2
  Shucai Xiao authored Feb 28, 2022
  
  f50bcff2
- refine contiguous gpu implementation · 702412b1
  Shucai Xiao authored Feb 28, 2022
  
  702412b1
- clang format · 562724bf
  Shucai Xiao authored Feb 28, 2022
  
  562724bf
- backup the mul_add latest implementation · 83f89182
  Shucai Xiao authored Feb 28, 2022
  
  83f89182
- reimlementation of mul_add · 67903751
  Shucai Xiao authored Feb 27, 2022
  
  67903751
- clang format · 287f7e9f
  Shucai Xiao authored Feb 27, 2022
  
  287f7e9f
- backup temp changes · b7d1ff95
  Shucai Xiao authored Feb 27, 2022
  
  b7d1ff95
26 Feb, 2022 2 commits
- change mul_add gpu implementation to use half2 for fp16 data type · 5f4e8561
  Shucai Xiao authored Feb 26, 2022
  
  5f4e8561
- reimplement mul_add kernel in a simple way · 9e610129
  Shucai Xiao authored Feb 26, 2022
  
  9e610129
08 Feb, 2022 3 commits
- revert nary · 9f755219
  Khalique Ahmed authored Feb 07, 2022
  
  9f755219
- formatting · 96c82f21
  Khalique Ahmed authored Feb 07, 2022
  
  96c82f21
- use other device name function · cb965031
  Khalique Ahmed authored Feb 07, 2022
  
  cb965031
31 Jan, 2022 1 commit
- formatting · 8d21ccdf
  Khalique Ahmed authored Jan 31, 2022
  
  8d21ccdf
09 Dec, 2021 1 commit

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

08 Oct, 2021 1 commit

Nonzero op extension (#870) · 0879b5f1

Shucai Xiao authored Oct 08, 2021

This PR is for the nonzero operator with static output shape.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0879b5f1

01 Oct, 2021 1 commit

Add multinomial op (#954) · 0b7672d7

turneram authored Oct 01, 2021

Add multinomial op to onnx parser with ref and GPU implementations.

The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution.

Resolves #821
Co-authored-by: Shucai Xiao <shucai@gmail.com>

0b7672d7

27 Sep, 2021 1 commit

Dpp opts for wavefront 32 (#951) · 6e2df9de

kahmed10 authored Sep 27, 2021

Checks wavefront size, then changes implementation and number of threads for DPP reduce

6e2df9de

16 Sep, 2021 1 commit

Loop operator (#853) · a275f590

Shucai Xiao authored Sep 16, 2021

Add Loop operator for opset version 13.
Notes: 1) Default max iteration number is 10 if no max iteration number is provided
2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model.
3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later.
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

a275f590

02 Sep, 2021 2 commits

Refactor where op (#918) · ebbaf8fc

turneram authored Sep 02, 2021

Implement the Where operator for the CPU and GPU.  This is for better performance.

ebbaf8fc

Topk op (#877) · 521b57a2

Shucai Xiao authored Sep 01, 2021



* add topk operator doe ref, cpu and gpu
* Hash modules for quicker lookup of modules
* add onnx unit test
* add unit tests for the topk operator
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

521b57a2

09 Aug, 2021 1 commit
- check for divisor encodable or not, fallback if needed (#906) · a8d86615
  Cagri Eryilmaz authored Aug 09, 2021
```
* check for divisor encodable or not, fallback if needed

* verify test for retinaface case
```
  a8d86615
09 Jul, 2021 1 commit
- fix review comments · 4e861c7c
  Shucai Xiao authored Jul 08, 2021
  
  4e861c7c
08 Jul, 2021 3 commits

Add inclusive scan on the GPU (#872) · 6ba279cc

Paul Fultz II authored Jul 08, 2021



* Add initial scan operator

* Formatting

* Fix with a working test

* Fix bugs

* Formatting

* Formatting

* Simplify

* Formatting

* Use non-power of 2 for test

* Make pointer
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

6ba279cc

clang format · 4a24a2dd
Shucai Xiao authored Jul 08, 2021

4a24a2dd
fix review comments · 0281a411
Shucai Xiao authored Jul 08, 2021

0281a411

25 Jun, 2021 4 commits
- clang format · 27c0ae08
  Shucai Xiao authored Jun 25, 2021
  
  27c0ae08
- refine unit tests · dd651742
  Shucai Xiao authored Jun 25, 2021
  
  dd651742
- clang format · 7135da72
  Shucai Xiao authored Jun 25, 2021
  
  7135da72
- gpu implementation of the scatter operator · 973cafd4
  Shucai Xiao authored Jun 25, 2021
  
  973cafd4
11 Jun, 2021 1 commit
- Disable dpp for wavefront 32 (#857) · 764b1e44
  Paul Fultz II authored Jun 11, 2021
```
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>
```
  764b1e44
08 Jun, 2021 1 commit

Reverse Op (#846) · 9c54fc4f

Cagri Eryilmaz authored Jun 08, 2021



* init reverseOp branch: ref op + ref test. WIP

* first passing basic test

* cleanup

* additional axis implementation

* additional test

* ref op implementation vec to int for axis

* ref op test change for axis

* initial gpu files and test

* updates to implementation and test

* fixed some issues

* clang format

* cleanup

* formatting

* removing comments

* remove local size, back to default

* update tests: replace with std functions

* multiple axis for reverse op

* fix a build error

* clang format

* more tests

* fix a bug for the reverse device function

* clang format

* fix a bug

* clang format

* ref test updates, multiaxis

* formatting
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

9c54fc4f

03 May, 2021 1 commit

Reduce types generated for hip kernels (#814) · 3becd974

Paul Fultz II authored May 03, 2021



* Remove unused data types

* Formatting

* Reduce types generated for hip kernels

* Formatting

* Fix onnx tests

* Formatting
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

3becd974

27 Apr, 2021 1 commit
- change softmax block size · 32b69ceb
  Khalique Ahmed authored Apr 26, 2021
  
  32b69ceb
26 Apr, 2021 1 commit
- add fp16 fixes · 781ce146
  Khalique Ahmed authored Apr 26, 2021
  
  781ce146
05 Mar, 2021 1 commit

Infer outputs in tf (#764) · 197b3a46

kahmed10 authored Mar 05, 2021



* fix relu6

* add more transposes

* add multi output

* formatting

* add tests

* formatting

* fix tests

* change to_nchw for outputs

* add python api

* fix cppcheck

* remove variable

* fix lambda

* add multi_output test

* add more tests and merge

* fix help message

* debugging work

* fix valid op string

* formatting

* manual merge

* mark function as const
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>
Co-authored-by: Shucai Xiao <shucai@gmail.com>

197b3a46

26 Feb, 2021 1 commit

changes for not operator (#735) · ebf8bd20

Cagri Eryilmaz authored Feb 26, 2021



* changes for not operator

* changed name of the op from unary_not to not

* Added tests for op and onnx parsing

* reordering not_test in onnx_test.cpp

* not operator -- gpu implementation

* added bool test for not operator

* Added test and missing links for not operator on GPU

* typo fix

* adding .onnx test files for not operator

* formatting
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

ebf8bd20

08 Feb, 2021 1 commit

Add a pass to remove unsupported data types (#738) · 3d24a21c

Paul Fultz II authored Feb 07, 2021



* Add eliminate_data_type pass

* Formatting

* Auto convert quant ops

* Formatting

* Flip the order of decompose

* Compute max size differently

* Formatting

* Clamp values in convert

* Formatting

* Fix loss of precision in reduce

* Formatting

* Fix bugs in reduction

* Fix accumulator type in reference softmax implementation

* Formatting

* Update convert test

* Remove unused variables

* Remove unnecessary quant_dot check

* Formatting

* Add tests

* Formatting

* Remove unused code

* Remove duplicate ops

* Remove blaze dependency

* Use set since shape::type_t is no hashable on gcc 5

* Formatting
Co-authored-by: Shucai Xiao <shucai@gmail.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

3d24a21c

19 Jan, 2021 1 commit

Logical ops (#718) · 4d46cbdb

Shucai Xiao authored Jan 19, 2021

* add the and operator

* clang format

* add unit tests for the and operator

* clang format

* change the and name to logical_and and add the logical_or, logical_xor

* clang format

* add onnx unit tests for or and xor

* add more unit tests

4d46cbdb

08 Jan, 2021 1 commit

Revamp CI infrastucture (#706) · ceb4ca09

Paul Fultz II authored Jan 08, 2021



* Add build and test github workflow

* Fix cget command

* Remove def-requirements.txt

* Add tmate session to debug workflow

* Run tmate session after installing dependencies

* Print date periodically

* Add clang tidy action

* Seperate build and run container in two different jobs

* Run bash script

* Remove interactive flag

* Try to mount the files

* Try to use the github workspace

* WIthout double braces

* Use env variable

* Pipe bash script in

* Run using hip-clang

* Use correct path

* Add verbose

* Remove j flag

* Only run for onnx file to debug

* Manually run clang-tidy

* Remove quiet flag

* Print header file

* Printout environment

* Remove extra defines

* Remove fixits and config flag

* Show ldd

* Add tmate session

* Run onnx protobuf first

* Generate proto for tensorflow

* Update cppcheck version

* Fix some cppcheck issues

* Add const

* Cppcheck fixes

* Formatting

* Fix more cppcheck issues

* Run two jobs

* Cache analysis and run format checking

* Fix yaml issues

* Fix yaml issues

* Fix indentation

* Switch to hip-clang for main docker file

* Use hip-clang in the readme

* Fixes for jenkins

* Use ccache to build

* Combine file

* Set restore keys

* Change stage name

* Build with ccache

* Add missing dependency for ccache

* Build debug with codecov

* Fix workflow syntax

* Fix list

* Use quotes

* Got to correct build path

* Install lcov

* Use sudo

* Echo all commands

* Setup tmate

* Add verbose output

* Build with cmake directly

* Add pthread flag

* Remove python config

* Continue on error

* Use on or off for cmake flag

* Use always upload cache

* Verbose output

* Verbose output from build

* Build one target

* Reduce debug symbols

* Increase garbage collection

* Remove dmesg

* Increase it to 20

* Update rocm cmake version

* Remove jobs from jenkins

* Run on all 3 ubuntus

* Remove gcc 5 jobs

* Dont add flag on 16.04

* Only upload coverage on 18.04

* Dont build for ubuntu 20.04

* Use matrix.os

* Use O2 for hip-clang since lower optimizations are broken

* Use rocm 3.0

* Pass ccache as cmake variable instead of env variable

* Build miopen from source

* Show ccache statistics

* Print log information

* Set compression level

* Use hash dir

* Set hashdir

* Install clang ocl from system

* Up compression level

* Add locale

* Increase cache size to 1G

* Lower compression level to 9

* Remove split dwarf

* Remove Og

* Add back Og

* Seperate debug and codecov

* Add missing backlash

* Garbage collect more often

* Add missing locales package

* Use Os

* Install onednn in docker and run tests

* Include target headers in tests

* Increase timeout

* Remove if condtion

* Make flag public

* Suppress memory leaks in onednn

* Use equal

* Add gh annotations

* Update rocm-cmake version

* Add ldconfig
Co-authored-by: Shucai Xiao <shucai@gmail.com>

ceb4ca09