Commits · b8898d7e90cc8ff94ee83eee11bc51b079420bee · gaoqiong / MIGraphX

17 Jun, 2023 2 commits

Update CK commit hash and add gfx940 to supported archs (#1842) · b8898d7e

turneram authored Jun 17, 2023

* Add initial ck_gemm code

* Format

* Add additional src files

* Format

* Add include

* Simplify fuse_ck

* Format

* Rename var

* Enable pass

* Update ck version

* Fix include

* Add group stride

* Disable warnings for ck headers

* Format

* Add unpack array

* Add interface to enable tuning

* Format

* Update compile_ops to handle tuning config

* Format

* Add some comments

* Move time_op to migraphx_gpu

* Add banchmarking

* Refactor

* Format

* Add lift class macro

* Use device name

* Format

* Generate configs

* Format

* Pass tuning parameter

* Move data type to is_ck_gemm matcher

* Format

* Add problem_cache to avoid retuning same configs

* Format

* Format

* Mark the problems

* Format

* Use is_null

* Format

* Resize vector

* Only tune with exaustive tuning

* Format

* Use assert

* FOrmat

* Tidy fixes

* More tidy fixes

* Format

* Add license to missing files

* Format

* Use transform

* Format

* Fix tidy

* Format

* Fix cppcheck issues

* Format

* Add static_assert

* Add ops header

* Add assertion in batcher

* Format

* Improve the batch fold check

* Format

* Add where op workaround for CK

* Skip if any input is not a supported ck type

* Format

* Check batch is standard

* Format

* Remove redundant static keyword

* Update commit hash

* Fix error when running without --exhaustive-tune

* Formatting

* Formatting

* Remove fuse_ck_gemm_softmax_gemm

* Update ck hash

* Correct spelling mistake

* Remove commented out logic from fuse_ck

* Remove unused include and add comment

* Formatting

* Remove redundant get_shape and remove ck_gemm from names

* Formatting

* Allow for mixed types with int8 gemms

* Formatting

* Add back find_package from merge

* Update CK commit hash and add gfx940 to fuse_ops supported archs

* Formatting

* Update CK hash

b8898d7e

Fix convert operation for NaNs (#1840) · 2d635f91

Umang Yadav authored Jun 17, 2023

* Fix convert for the NaNs

* NaNs can't be compared, use std::isnan()

* formatting

* formatting

* formatting

* add extra tests

2d635f91

16 Jun, 2023 1 commit

2+ input multibroadcast and 2+ input dynamic shape insert_common_op (#1836) · 27bb8ca6

Charlie Lin authored Jun 16, 2023



* initial

* Added tests and new functionality

* Update optimals handling

* Simplify conditionals

* Ref test, update docs

* Remove comment, suggestion unclear

---------
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>

27bb8ca6

15 Jun, 2023 2 commits

use __hmax, __hmin (#1813) · d208adfc
Umang Yadav authored Jun 15, 2023

d208adfc

fix parse_instancenorm to create broadcast and multibroadcast instruc… (#1715) · 41ba30d5

Brian Pickrell authored Jun 15, 2023

* fix parse_instancenorm to create broadcast and multibroadcast instructions with two dynamic shape arguments instead of 1.  Their make_op() functions don't support dynamic shapes when called with one input.  This caused an error when parsing an ONNX 3duunet model

* Use add_common_op() to create multibroadcast op.

* add verification and parsing test for instance_norm with dynamic input.  Parse test doesn't pass.

* fix for test; still doesn't pass

* another fix for test; still doesn't pass

* work in progress, instance_norm_dyn_batch_test works but instance_norm_test doesn't

* fix onnx instancenorm tests to match parser changes.  Passes all check tests

* Updated comments explaining usage of add_common_op()

* hand-merged conflicts with develop

* fix instance_norm_half_test after merge

* add Onnx test instance_norm_dyn_batch_half_test

* add shape test cases broadcast_1in_dyn_error and multibroadcast_1in_dyn_error_0

41ba30d5

14 Jun, 2023 2 commits
- Fix TRACE_EVAL > 1 (#1835) · 5bf067ed
  Umang Yadav authored Jun 14, 2023
```
* add fix for the trace_eval

* Add throw for the debug builds

* Formatting

---------
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
```
  5bf067ed
- Print message from driver if offload copy is set for compiled program (#1802) · aa508e1d
  Umang Yadav authored Jun 14, 2023
  
  aa508e1d
12 Jun, 2023 1 commit
- Enable reshape on nonstandard shapes (#1681) · 0dae73fa
  Paul Fultz II authored Jun 12, 2023
  
  0dae73fa
09 Jun, 2023 3 commits
- Enable hipRTC (#1827) · c900e382
  Chris Austen authored Jun 09, 2023
  
  c900e382
- Fix compile warnings for shadowing variable names (#1825) · dfde6d07
  Umang Yadav authored Jun 09, 2023
  
  dfde6d07
- Add missing specialization for the `nullptr` for the hash function (#1824) · 26aabd2a
  Umang Yadav authored Jun 09, 2023
```
#1791 Added hash function for value class. It uses the Visit function and has specialization for the bool_type and <vector> type but was missing specialization for the nullptr. Nullptr caused compilation issues for RHEL, SLES and CentOS.
```
  26aabd2a
08 Jun, 2023 2 commits
- Add initial CK integration plus auto-tuning for kernels (#1791) · 25af8710
  Paul Fultz II authored Jun 08, 2023
```
Enable with MIGRAPHX_ENABLE_CK=1 and --exhaustive-tune tune flag
```
  25af8710
- disable hipRTC temporarily (#1817) · e5a33aad
  Chris Austen authored Jun 07, 2023
  
  e5a33aad
06 Jun, 2023 2 commits

re-enable hiprtc (#1812) · 85ff4f85
Umang Yadav authored Jun 06, 2023

85ff4f85

Conditionally enable GeLU approximation (#1810) · c5d0c5b6

Umang Yadav authored Jun 05, 2023

Sigmoid approximation for GeLU was introduced in #1299 for Fp16. The sigmoid approximation is known to get better perf but lower accuracy. https://arxiv.org/pdf/1606.08415.pdf

c5d0c5b6

05 Jun, 2023 1 commit

Test and doc update for shape.from_permutation() (#1742) · 68446f7a

Charlie Lin authored Jun 05, 2023

Changed the doc for find_permutation(shape) to be more clear that it is finding the permutation that would make the shape standard

68446f7a

01 Jun, 2023 1 commit

Convert Fp16 instance-norm to FP32 temporarily (#1779) · 49b341d3

Umang Yadav authored Jun 01, 2023

By converting to fp32 : fp16 3d-unet model accuracy comes out the same as FP32 accuracy.

By using reduce_sum method on Fp16 : accuracy comes out ~0.9% lower compared to fp32 while keeping entire model in fp16.

49b341d3

31 May, 2023 1 commit
- Update pass manager to handle multi-target compilation (#1672) · 9473e3a2
  Umang Yadav authored May 31, 2023
```
partially solves #1656
This PR only handles compilation part of multitarget.
```
  9473e3a2
30 May, 2023 2 commits

Improvements to driver output (#1710) · d32ab85b

Paul Fultz II authored May 30, 2023

Use generate_argument instead of generate_literal for python output as generate_literal doesnt exists
Shorten the names for variables from the main module
Use prefix p_ for parameters
Use shorter variable m for main module in python

d32ab85b

Add option to use type erased matchers to reduce symbol names (#1755) · 55f420fb
Paul Fultz II authored May 30, 2023

55f420fb

28 May, 2023 1 commit
- Enable quantizing both int8 and fp16 in the driver (#1757) · 26c1efa5
  Paul Fultz II authored May 28, 2023
```
* Allow quantizing for both int8 and fp16
```
  26c1efa5
25 May, 2023 1 commit
- Update cpp generator to handle inf from float (#1758) · 763dd1da
  Ted Themistokleous authored May 25, 2023
```
Use std::numeric_limits::min/max() functions plus the appropriate value to encode -inf/inf 
```
  763dd1da
24 May, 2023 2 commits
- Change compiler_replace to a class that stores the code objects directly (#1739) · 37f5df20
  Paul Fultz II authored May 24, 2023
```
Enable retrieving the code object to do tuning in the future.
```
  37f5df20
- Update xdlops/rocblas fp32 arch (#1752) · 77042e30
  kahmed10 authored May 24, 2023
```
Refactor supported gfx archs
```
  77042e30
23 May, 2023 1 commit
- Backout fp16 max/min HIP API change (#1771) · 42772fd6
  Umang Yadav authored May 23, 2023
```
back out changes for rocm-5.5
```
  42772fd6
20 May, 2023 1 commit
- Use half HIP APIs to compute max and min (#1764) · 88fb551c
  Umang Yadav authored May 19, 2023
```
* use half hip functions to compute max and min
* add verify test for min and max
```
  88fb551c
19 May, 2023 1 commit
- Enabling native int32 type support (#1721) · 8d9d5d1c
  Zhuoran Yin authored May 19, 2023
```
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
```
  8d9d5d1c
17 May, 2023 2 commits

adjust docker files to support new rocm 5.5 (#1729) · 5e35957b
Chris Austen authored May 17, 2023
```
Move CI to support the rocm5.5 release
```
5e35957b

scalar unsqueeze broadcast support (#1753) · 2140fe19

shivadbhavsar authored May 16, 2023

Adding support for broadcasted scalars to unsqueeze op.

Specifying steps other than 1 is disallowed in this implementation since we want the output the always be a tensor. We can support varying step sizes if we allow a broadcasted scalar output from this op.

2140fe19

08 May, 2023 1 commit
- Remove workaround for Sin (#1701) · 89f7ac0d
  Umang Yadav authored May 08, 2023
  
  89f7ac0d
06 May, 2023 1 commit
- Optimize file space of github runners (#1743) · 2bebf64d
  Chris Austen authored May 06, 2023
```
Remove various file not required for what we use Github runners for
```
  2bebf64d
05 May, 2023 3 commits
- Python API update for dynamic batch (#1723) · ccc4b8a4
  Charlie Lin authored May 05, 2023
```
Python API with documentation updates
```
  ccc4b8a4
- [MLIR][5.7] add input fusion support for view ops (#1705) · 4996c6d7
  Manupa Karunaratne authored May 05, 2023
```
Adds support for slice,transpose,contigous and reshape fusions into input tensors for a fused mlir kernel.
```
  4996c6d7
- add tf supported ops in driver, sort both onnx and tf alphabetically (#1732) · 4fb3fd4a
  kahmed10 authored May 04, 2023
```
add option to print tf supported ops
sort both onnx and tf ops alphabetically
```
  4fb3fd4a
04 May, 2023 2 commits

Rewrite multiplies with dot operator (#1685) · 457703a8

Paul Fultz II authored May 04, 2023

When multiplying either the input or output across the K dimensions then the multiple can be applied to the constant which can then be folded with propagate_const.

457703a8

[mlir] Adding quant convolution fusion as anchor op (#1683) · 7f105952

Zhuoran Yin authored May 03, 2023

Exposed the mlir_enabled() call the decide for lowering pipeline's enablement
Disabled the rewrite quantization pipeline in mlir compilation
Added quant convolution as anchor ops
Fixed the return type expectations
Added the fall back hip implementation for quantizelinear and dequantizelinear
Will need advises to improve the implementation for quantizelinear

7f105952

03 May, 2023 1 commit

Update C/C++ API for dynamic batch (#1712) · 0ff00ef6

Charlie Lin authored May 02, 2023

Relies on Removed split_single_dyn_dim compile flag #1711
Exposes dynamic_dimension as a opaque object with dynamic_dimensions and optimals
Exposes ONNX dyn_input_dims and default_dyn_dim to run with dynamic batch
Updates api.py to be able to create objects from aggregate initialization (used for dynamic_dimension)
Uses offload copy for now

0ff00ef6

02 May, 2023 1 commit

Handle broadcasts across dot and concat (#1689) · a8ace295

Paul Fultz II authored May 02, 2023

Improves the constant propagation for bert models. Larger batch size no longer use as large of constants.  Also improves the speed of model compilation

a8ace295

28 Apr, 2023 1 commit
- Removed split_single_dyn_dim compile flag (#1711) · bcc1f64a
  Charlie Lin authored Apr 28, 2023
  
  bcc1f64a
25 Apr, 2023 1 commit
- update rocBLAS version check to support 3.0 and above (#1716) · ed6542ee
  kahmed10 authored Apr 25, 2023
```
update rocBLAS version check to support 3.0 and above with simplified logic
```
  ed6542ee