Commits · nonzero_track_data_position · gaoqiong / MIGraphX

27 Jun, 2023 1 commit

Fix Nonzero to track data value with sentinel value based on elements · 28727db2

Ted Themistokleous authored May 25, 2023

We can't change the behaviour of the nonzero op and we currently pad the output
with zeros. This unfortunately obfuscates the following cases:

1. When the only nonzero element is the first index - the whole tensor is padded
with zeros its not obvious if the first value is valid index or padded

2. When the nonzero elements vector is used for indicies. The resulting vector
with the padded value of 0 is still a valid index thus gather/gatherND and other ops
will assume the 0 index is valid and operate accordingly.

In this case, by adding a sentinel value of the number of static elements used
by the desired shape, the resulting nonzero output can now track how many elements
are valid by determining the value in the correct range.

Originally I intended to use -1 but not all datatypes use this if say, we're dealing with
unsigned values in our vectors or booleans.

28727db2

26 Jun, 2023 1 commit
- Print max,min,mean and stddev values for TRACE_EVAL = 2 (#1864) · 84a8f450
  Umang Yadav authored Jun 26, 2023
  
  84a8f450
23 Jun, 2023 1 commit
- Remove clamping for converts (#1853) · e794a63c
  Umang Yadav authored Jun 23, 2023
```
Fixes #1852  Fixes #1847
```
  e794a63c
22 Jun, 2023 2 commits

[mlir] Adding mlir quant_dot operator support (#1816) · 01342ae1
Zhuoran Yin authored Jun 22, 2023
```
Add mlir quant_dot operator support
```
01342ae1

Update install prereqs python fix (#1782) · c5cd87ce

Ted Themistokleous authored Jun 22, 2023



* Update instal_prereqs.sh to handle 22.04 defines

Needed to run containers with 22.04

* Add Dockerfile for Ubuntu 22.04 and ROCm 5.5

Updated dockerfile to use ROCm 5.5 and Ubuntu 22.04 for use with building MIGraphX
Able to run make -j$(nproc) check successfully with this

* Clean this up since its breaking CI

* cleanup install preq some more.

-use one protobuf version
-remove extra python3.8 installs from 3.10 case

* Move comment for protobuf comment

* Move Dockerfile for 22.04 to Dockerfiles/ folder

* Move and rename 2204 docker file

remove Docker_** from the name. Move these to tools/docker

* Add pip3 installs to be shared between python versions

* Add Package pin from repo.radeon.com

* Add CMAKE_ARG ONNX_USE_PROTOBUF_SHARED_LIBS for every default python dist

Set this to be default as part of installing prereqs

---------
Co-authored-by: Charlie Lin <charlie.lin@amd.com>
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>

c5cd87ce

21 Jun, 2023 2 commits
- Remove where op workaround for ck (#1854) · 226da497
  Paul Fultz II authored Jun 21, 2023
```
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>
```
  226da497
- use fast_math flag instead of ENV flag for GELU (#1855) · 0802c19e
  Umang Yadav authored Jun 21, 2023
```
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>
```
  0802c19e
20 Jun, 2023 1 commit

Update onnxruntime main fbf08c4b4dce5da245189203d9f6cfc41f6663a2 (#1843) · db63cc77

github-actions[bot] authored Jun 20, 2023


Co-authored-by: causten <causten@users.noreply.github.com>
Co-authored-by: Ted Themistokleous <107195283+TedThemistokleous@users.noreply.github.com>

db63cc77

17 Jun, 2023 3 commits

Add trace for SIMPLIFY_ALGEBRA matches (#1838) · a0fa3742

Ted Themistokleous authored Jun 17, 2023

* Add trace for SIMPLIFY_ALGEBRA matches

* Fix format

* handle review comments from Umang

-int to size_t for trace
-move env arg to top of simplify_algebra.cpp
-handle overload beter for find_matches

* Rename trace_mod param to trace_pass

More representative naming for what this trace flag does

a0fa3742

Update CK commit hash and add gfx940 to supported archs (#1842) · b8898d7e

turneram authored Jun 17, 2023

* Add initial ck_gemm code

* Format

* Add additional src files

* Format

* Add include

* Simplify fuse_ck

* Format

* Rename var

* Enable pass

* Update ck version

* Fix include

* Add group stride

* Disable warnings for ck headers

* Format

* Add unpack array

* Add interface to enable tuning

* Format

* Update compile_ops to handle tuning config

* Format

* Add some comments

* Move time_op to migraphx_gpu

* Add banchmarking

* Refactor

* Format

* Add lift class macro

* Use device name

* Format

* Generate configs

* Format

* Pass tuning parameter

* Move data type to is_ck_gemm matcher

* Format

* Add problem_cache to avoid retuning same configs

* Format

* Format

* Mark the problems

* Format

* Use is_null

* Format

* Resize vector

* Only tune with exaustive tuning

* Format

* Use assert

* FOrmat

* Tidy fixes

* More tidy fixes

* Format

* Add license to missing files

* Format

* Use transform

* Format

* Fix tidy

* Format

* Fix cppcheck issues

* Format

* Add static_assert

* Add ops header

* Add assertion in batcher

* Format

* Improve the batch fold check

* Format

* Add where op workaround for CK

* Skip if any input is not a supported ck type

* Format

* Check batch is standard

* Format

* Remove redundant static keyword

* Update commit hash

* Fix error when running without --exhaustive-tune

* Formatting

* Formatting

* Remove fuse_ck_gemm_softmax_gemm

* Update ck hash

* Correct spelling mistake

* Remove commented out logic from fuse_ck

* Remove unused include and add comment

* Formatting

* Remove redundant get_shape and remove ck_gemm from names

* Formatting

* Allow for mixed types with int8 gemms

* Formatting

* Add back find_package from merge

* Update CK commit hash and add gfx940 to fuse_ops supported archs

* Formatting

* Update CK hash

b8898d7e

Fix convert operation for NaNs (#1840) · 2d635f91

Umang Yadav authored Jun 17, 2023

* Fix convert for the NaNs

* NaNs can't be compared, use std::isnan()

* formatting

* formatting

* formatting

* add extra tests

2d635f91

16 Jun, 2023 2 commits

2+ input multibroadcast and 2+ input dynamic shape insert_common_op (#1836) · 27bb8ca6

Charlie Lin authored Jun 16, 2023



* initial

* Added tests and new functionality

* Update optimals handling

* Simplify conditionals

* Ref test, update docs

* Remove comment, suggestion unclear

---------
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>

27bb8ca6

Fallback on C arrays for add_embed_library on non-linux tests (#1804) · 013d4829
Paul Fultz II authored Jun 16, 2023

013d4829

15 Jun, 2023 2 commits

use __hmax, __hmin (#1813) · d208adfc
Umang Yadav authored Jun 15, 2023

d208adfc

fix parse_instancenorm to create broadcast and multibroadcast instruc… (#1715) · 41ba30d5

Brian Pickrell authored Jun 15, 2023

* fix parse_instancenorm to create broadcast and multibroadcast instructions with two dynamic shape arguments instead of 1.  Their make_op() functions don't support dynamic shapes when called with one input.  This caused an error when parsing an ONNX 3duunet model

* Use add_common_op() to create multibroadcast op.

* add verification and parsing test for instance_norm with dynamic input.  Parse test doesn't pass.

* fix for test; still doesn't pass

* another fix for test; still doesn't pass

* work in progress, instance_norm_dyn_batch_test works but instance_norm_test doesn't

* fix onnx instancenorm tests to match parser changes.  Passes all check tests

* Updated comments explaining usage of add_common_op()

* hand-merged conflicts with develop

* fix instance_norm_half_test after merge

* add Onnx test instance_norm_dyn_batch_half_test

* add shape test cases broadcast_1in_dyn_error and multibroadcast_1in_dyn_error_0

41ba30d5

14 Jun, 2023 2 commits
- Fix TRACE_EVAL > 1 (#1835) · 5bf067ed
  Umang Yadav authored Jun 14, 2023
```
* add fix for the trace_eval

* Add throw for the debug builds

* Formatting

---------
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
```
  5bf067ed
- Print message from driver if offload copy is set for compiled program (#1802) · aa508e1d
  Umang Yadav authored Jun 14, 2023
  
  aa508e1d
13 Jun, 2023 1 commit
- Fix shape typo in API test (#1787) · 193f105d
  Charlie Lin authored Jun 13, 2023
  
  193f105d
12 Jun, 2023 1 commit
- Enable reshape on nonstandard shapes (#1681) · 0dae73fa
  Paul Fultz II authored Jun 12, 2023
  
  0dae73fa
09 Jun, 2023 3 commits
- Enable hipRTC (#1827) · c900e382
  Chris Austen authored Jun 09, 2023
  
  c900e382
- Fix compile warnings for shadowing variable names (#1825) · dfde6d07
  Umang Yadav authored Jun 09, 2023
  
  dfde6d07
- Add missing specialization for the `nullptr` for the hash function (#1824) · 26aabd2a
  Umang Yadav authored Jun 09, 2023
```
#1791 Added hash function for value class. It uses the Visit function and has specialization for the bool_type and <vector> type but was missing specialization for the nullptr. Nullptr caused compilation issues for RHEL, SLES and CentOS.
```
  26aabd2a
08 Jun, 2023 2 commits
- Add initial CK integration plus auto-tuning for kernels (#1791) · 25af8710
  Paul Fultz II authored Jun 08, 2023
```
Enable with MIGRAPHX_ENABLE_CK=1 and --exhaustive-tune tune flag
```
  25af8710
- disable hipRTC temporarily (#1817) · e5a33aad
  Chris Austen authored Jun 07, 2023
  
  e5a33aad
06 Jun, 2023 2 commits

re-enable hiprtc (#1812) · 85ff4f85
Umang Yadav authored Jun 06, 2023

85ff4f85

Conditionally enable GeLU approximation (#1810) · c5d0c5b6

Umang Yadav authored Jun 05, 2023

Sigmoid approximation for GeLU was introduced in #1299 for Fp16. The sigmoid approximation is known to get better perf but lower accuracy. https://arxiv.org/pdf/1606.08415.pdf

c5d0c5b6

05 Jun, 2023 1 commit

Test and doc update for shape.from_permutation() (#1742) · 68446f7a

Charlie Lin authored Jun 05, 2023

Changed the doc for find_permutation(shape) to be more clear that it is finding the permutation that would make the shape standard

68446f7a

04 Jun, 2023 1 commit
- default to ROCm 5.5 (#1808) · 5df11e0f
  Igor Mirosavljevic authored Jun 04, 2023
  
  5df11e0f
02 Jun, 2023 1 commit
- replace np.bool with bool as per numpy request (#1640) · 10c42663
  Chris Austen authored Jun 02, 2023
  
  10c42663
01 Jun, 2023 1 commit

Convert Fp16 instance-norm to FP32 temporarily (#1779) · 49b341d3

Umang Yadav authored Jun 01, 2023

By converting to fp32 : fp16 3d-unet model accuracy comes out the same as FP32 accuracy.

By using reduce_sum method on Fp16 : accuracy comes out ~0.9% lower compared to fp32 while keeping entire model in fp16.

49b341d3

31 May, 2023 2 commits
- Check if generate files are different (#1789) · 37711924
  Paul Fultz II authored May 31, 2023
  
  37711924
- Update pass manager to handle multi-target compilation (#1672) · 9473e3a2
  Umang Yadav authored May 31, 2023
```
partially solves #1656
This PR only handles compilation part of multitarget.
```
  9473e3a2
30 May, 2023 2 commits

Improvements to driver output (#1710) · d32ab85b

Paul Fultz II authored May 30, 2023

Use generate_argument instead of generate_literal for python output as generate_literal doesnt exists
Shorten the names for variables from the main module
Use prefix p_ for parameters
Use shorter variable m for main module in python

d32ab85b

Add option to use type erased matchers to reduce symbol names (#1755) · 55f420fb
Paul Fultz II authored May 30, 2023

55f420fb

29 May, 2023 2 commits
- input parameters cleanup (#1777) · 3c93c314
  Pavle Jacovic authored May 30, 2023
  
  3c93c314
- Ensure CI labels map correctly (#1780) · 3ea6ff7b
  Chris Austen authored May 29, 2023
  
  3ea6ff7b
28 May, 2023 1 commit
- Enable quantizing both int8 and fp16 in the driver (#1757) · 26c1efa5
  Paul Fultz II authored May 28, 2023
```
* Allow quantizing for both int8 and fp16
```
  26c1efa5
25 May, 2023 1 commit
- Update cpp generator to handle inf from float (#1758) · 763dd1da
  Ted Themistokleous authored May 25, 2023
```
Use std::numeric_limits::min/max() functions plus the appropriate value to encode -inf/inf 
```
  763dd1da
24 May, 2023 2 commits
- Change compiler_replace to a class that stores the code objects directly (#1739) · 37f5df20
  Paul Fultz II authored May 24, 2023
```
Enable retrieving the code object to do tuning in the future.
```
  37f5df20
- Update xdlops/rocblas fp32 arch (#1752) · 77042e30
  kahmed10 authored May 24, 2023
```
Refactor supported gfx archs
```
  77042e30