Commits · a41cd5c0b493bbb7d21078f1a842675ff824d2b7 · gaoqiong / MIGraphX

17 Nov, 2023 1 commit

Ref implementation of FP8 (#2438) · 7f93a818

Umang Yadav authored Nov 17, 2023

Handles all 4 Fp8 dtypes listed here : https://onnx.ai/onnx/technical/float8.html
Follows saturation/clipping logic from table there as well : https://onnx.ai/onnx/technical/float8.html#cast
Only adding fp8e4m3fnuz in MIGraphX IR for now.

7f93a818

08 Nov, 2023 2 commits

Fix Round operator inaccuracy (#2244) · 48c4453c

Zakor Gyula authored Nov 08, 2023

The inaccuracy was caused by ONNX round requires nearest integer rounding for halway (0.5) cases.
std::round rounds away from zero, thus giving wrong results with halfway cases.
Replaced std::round with std::nearbyint which uses the correct rounding by default.

48c4453c

Blas auto-tuning for GEMMs (#1668) · d7c8b66f
Brian Pickrell authored Nov 07, 2023

d7c8b66f

30 Oct, 2023 2 commits
- Remove int8x4 format completely (#2373) · 22bb777f
  Umang Yadav authored Oct 30, 2023
  
  22bb777f
- Disable fuse_mlir test if MLIR is not enabled (#2371) · 0e56b124
  Umang Yadav authored Oct 29, 2023
  
  0e56b124
16 Oct, 2023 1 commit

Enable MLIR by default for more cases (#2274) · 650ba45f

Paul Fultz II authored Oct 15, 2023

This will enable MLIR by default for these cases:

Any convolution fusion
Any int8 gemm fusion
All Navi3 standalone convolutions
With a flag(ie MIGRAPHX_ENABLE_MLIR) to enable MLIR for floating-point gemm fusions
Except:

3x3 winnograd convolutions fusions (except on Navi)
K > 2048 on gemm (as CK)
Also there is MIGRAPHX_DISABLE_MLIR to disable MLIR completely.

650ba45f

06 Oct, 2023 1 commit
- prepare for Windows resources with resource script files (#1999) · 9d8331b4
  Artur Wojcik authored Oct 06, 2023
  
  9d8331b4
28 Sep, 2023 1 commit

Add options to set tolerances inside MIGraphX driver (#2213) · 69d8d789

Umang Yadav authored Sep 28, 2023

MIGraphX verification by default uses normalized RMS error as the basis for the verification.  This change adds some logic to allow migraphx to do "np.allclose" type of elementwise verification using atol and rtol.

Commit also includes changes to consistently pass "gold" or "expected" results as the second argument for "verify_range()" calls.  Default RMS tolerance inside driver is set to 0.001 which IMO is high for FP32 compared to what we had earlier. Need better defaults

69d8d789

27 Sep, 2023 1 commit
- fix order in layernorm matcher and add test for the same (#2189) · 03d8a250
  Umang Yadav authored Sep 27, 2023
  
  03d8a250
13 Sep, 2023 1 commit
- Disable unsafe buffer usage warning when its available (#2168) · e05c94b4
  Paul Fultz II authored Sep 13, 2023
  
  e05c94b4
18 Aug, 2023 1 commit
- Remove operators.hpp includes (#2086) · e6290061
  Paul Fultz II authored Aug 18, 2023
  
  e6290061
10 Aug, 2023 1 commit

[MLIR] Changes for tuning API v2, MLIR grid layout changes. (#1961) · 065d06af

Krzysztof Drewniak authored Aug 09, 2023

This PR constitutes the MIGraphX-side changes needed to not break the build in the presence of
ROCmSoftwarePlatform/rocMLIR#1136 , and updates what data is sent in to MLIR during the kernel generation and tuning process.

065d06af

06 Aug, 2023 1 commit
- Improve MLIR symbol names (#2025) · 3499bc21
  Paul Fultz II authored Aug 06, 2023
  
  3499bc21
30 Jul, 2023 1 commit

Enable tuning for MLIR (#1965) · be6ecff6

Paul Fultz II authored Jul 30, 2023

* Add initial tuning support

* Format

* Add extra param

* Format

* Use exauhstive flag

* Format

* Set expected shapes

* Format

* Format

* Fix missing symbol

* Format

* Add missing license header

* Format

* Update src/targets/gpu/include/migraphx/gpu/mlir.hpp

be6ecff6

16 Jul, 2023 1 commit
- add verify namespace (#1952) · 68a9a23f
  Umang Yadav authored Jul 16, 2023
  
  68a9a23f
05 Jul, 2023 1 commit

Fix literal rounding in codegen functions (#1910) · 697709a7

kahmed10 authored Jul 05, 2023

Fixes the failing test case in #1815. Added a test that would otherwise fail without the change.

697709a7

29 Jun, 2023 1 commit

Update list of pointwise operators supported by MLIR (#1848) · b3a610df

Krzysztof Drewniak authored Jun 29, 2023

Bump MLIR commit to include latest supported pointwise ops.
Expand the MLIR approve list 
Ensure that operations such as tanh() that don't have integer implementations (at least in MLIR) aren't used within MLIR modules.
Add additional tests.

b3a610df

22 Jun, 2023 1 commit
- [mlir] Adding mlir quant_dot operator support (#1816) · 01342ae1
  Zhuoran Yin authored Jun 22, 2023
```
Add mlir quant_dot operator support
```
  01342ae1
19 May, 2023 1 commit
- Enabling native int32 type support (#1721) · 8d9d5d1c
  Zhuoran Yin authored May 19, 2023
```
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
```
  8d9d5d1c
17 May, 2023 1 commit
- adjust docker files to support new rocm 5.5 (#1729) · 5e35957b
  Chris Austen authored May 17, 2023
```
Move CI to support the rocm5.5 release
```
  5e35957b
04 May, 2023 1 commit

[mlir] Adding quant convolution fusion as anchor op (#1683) · 7f105952

Zhuoran Yin authored May 03, 2023

Exposed the mlir_enabled() call the decide for lowering pipeline's enablement
Disabled the rewrite quantization pipeline in mlir compilation
Added quant convolution as anchor ops
Fixed the return type expectations
Added the fall back hip implementation for quantizelinear and dequantizelinear
Will need advises to improve the implementation for quantizelinear

7f105952

13 Apr, 2023 1 commit
- [mlir] Adding quantizelinear, dequantizelinear and quant_convolution support (#1675) · 7b2a5ccf
  Zhuoran Yin authored Apr 13, 2023
  
  7b2a5ccf
06 Apr, 2023 1 commit

Driver dynamic batch update (#1652) · adccec52

Charlie Lin authored Apr 06, 2023

Examples..

bin/driver verify /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim @data "[{min:1, max:4}, 3, 224, 224]"

bin/driver compile /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --default-dyn-dim "{min:1, max:10}" --output resnet50_batch1-10.mxr

bin/driver perf resnet50_batch1-10.mxr --batch 4

adccec52

27 Mar, 2023 1 commit

[MLIR] add dot offloads with manual tuning support (#1631) · 7c4dc99a

Manupa Karunaratne authored Mar 27, 2023

* [MLIR] add dot offloads with manual tuning support
* This commit adds dot + pointwise fusion support
along with manual tuning using rocMLIR.

7c4dc99a

18 Mar, 2023 1 commit
- Dynamically plug-in backend target libs (#1608) · 7a7040aa
  Umang Yadav authored Mar 18, 2023
```
Fixes #1595
```
  7a7040aa
31 Jan, 2023 1 commit

hipRTC fixes (#1531) · 91cc7242

Umang Yadav authored Jan 31, 2023

Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC.
Added stages in Jenkins for hipRTC.
Fixes for some of the pending issues from hipRTC.

91cc7242

06 Dec, 2022 2 commits

Add tupleVisitor for from_gpu (#1465) · a4c2b889

Ted Themistokleous authored Dec 06, 2022

Need this for when we debug and use MIGRAPHX_TRACE_EVAL() to show tuples
Without this we break when reading our buffer due to the use of visit()
This came up as part of #1283 debugging.

a4c2b889

Update MLIR integration (#1451) · be70702d

jungpark-mlir authored Dec 06, 2022

Update dialect registration interface
Update 2nd build pipeline call and use full arch name

be70702d

27 Oct, 2022 1 commit

Upgrade CI environment to 5.3.0 (#1198) · 4b1c1c41

Chris Austen authored Oct 27, 2022

Upgraded Dockerfiles and fixed tidy issues to make Ubuntu 20.04 and ROCm 5.3.0 the default

4b1c1c41

18 Oct, 2022 1 commit

Add support in mlir for transposed and broadcasted shaped (#1378) · c3e02b18

Paul Fultz II authored Oct 18, 2022



* Enable non-standard shape
* Use perfdb for non xdlops
* Fix transpose+broadcast strides
Co-authored-by: jungpark-mlir <jungwook.park@amd.com>

c3e02b18

13 Oct, 2022 1 commit

Refactor dynamic padding mode (#1387) · 32f6388c

Charlie Lin authored Oct 13, 2022

Removes use_dynamic_same_auto_pad
Change padding_mode to be used for dynamic padding
Move compute_padded_shape to pad_calc.cpp as it will be used in other dynamic padding cases
Fix same_lower compute_padded_shape bug and add a test.

32f6388c

04 Oct, 2022 1 commit
- Stream sync Changset (#1358) · f7d987ba
  Ted Themistokleous authored Oct 04, 2022
```
Stream sync changes and associated API level changes
```
  f7d987ba
29 Sep, 2022 1 commit

Use find_2.0 API for the convolution (#1346) · e19f78ae

Umang Yadav authored Sep 29, 2022

Improvements/Additions to be made:

changes for the quant_convolution,
changes for the deconvolution,
Macros for MIOpen status checks

e19f78ae

28 Sep, 2022 1 commit

Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960

Umang Yadav authored Sep 28, 2022

test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.

70e63960

27 Sep, 2022 1 commit
- Add onnx mod operator gpu cpu (#1306) · 40118191
  Ted Themistokleous authored Sep 26, 2022
```
Implement operator for CPU and GPU implementations
```
  40118191
23 Sep, 2022 1 commit
- Remove unused device functions (#1394) · 8ea8473d
  Paul Fultz II authored Sep 23, 2022
```
* Remove device functions
* Update tests
```
  8ea8473d
16 Sep, 2022 1 commit
- Fix typo for add_sigmoid (#1385) · 10f37f49
  Umang Yadav authored Sep 16, 2022
```
* fix typo for add_sigmoid
```
  10f37f49
15 Sep, 2022 1 commit

[mlir] Replaced `find_library` with `find_package` to locate MLIR static library (#1373) · e1e36cdc

Lixun Zhang authored Sep 15, 2022

* Replaced `find_library` with `find_package` to locate MLIR static library
* Unified the include dir for headers and remove backward compatibility
* Embedded the external/include dir into the exported library

e1e36cdc

04 Aug, 2022 1 commit

Dynamic ref convolution op (#1224) · 67f77ac1

Charlie Lin authored Aug 04, 2022



* Dynamic shape handling in shape object

* rewrite empty lens multibroadcast test

* Shape class changes to handle dynamic
* More throw errors for functions that don't make sense for dynamic shape
* Print output changes
* Serialization changes

* Fixing serialization errors

* Remove const on dyn_dim copy getters

* Dynamic shape tests

* Fix serialize errors

* Add dyn_data struct to avoid ambiguous constructor

* Tidy fix: emplace_back() over for loop

* Tidy fix: use move

* Use std::initializer_list in constructor
Reverts the dyn_data struct change
Should get around the ambiguous braced initialization list error

* avoid typedef

* element_space, min,max,opt _lens change

* formatting

* Comments fix

* dynamic bytes() test

* Seralize and reflect changes

* formatting

* Test the dynamic lens functions

* progress

* Formatting

* Dynamic conv draft progress

* Add operator<< tests for coverage

* Coverage update

* Add to conv dynamic batch test

* Dynamic image size test

* Dynamic weight handling

* Dyn image shape test change, fix dyn weight cond

* Comment update

* Dynamic weights shape test and fix

* Use ternary operator

* Tidy fixes

* Handle dynamic graph input shapes in ONNX parser

* Formatting

* Handle dynamic shape for convolution

* formatting

* cppcheck fixes

* Add onnx test files

* Fix typo

* Disable auto_pad for dynamic input shape

* check_shapes object checks for allowing dynamic shapes

* Fix any_of

* Change to maintain const objectness

* Formatting

* Check shapes allow dynamic

* Refactor compute_shape() call into op.compute()
Allows for per operator differences with handling dynamic shape
Fix operation.hpp change to use the generator

* Comment fix

* Refactor normalize_attributes() calls to use max_lens()

* Comment addition

* Update other normalize_attributes() calls

* Change to using constructor and add tests

* Use const member function

* Add more dynamic shape support

* Add tests for error code coverage

* Fix opt shape bug and add shape tests

* capture all by ref

* Fix typo with img shape calculation

* Add more tests

* dynamic auto pad attempt
Linker error with pad_calc.cpp

* Fix parse dyn auto_pad
Should only need to use dynamic auto pad when the image shape or kernel
shape are dynamic. For a dynamic batch size, the auto pad calculation is
the same.

* Fix linking error

* Fix auto_pad bug
Fixed input tensor with auto_pad setting on

* auto_pad onnx tests

* Fix auto_pad calculation, evaluate in ref_conv
add ref_ops tests

* Add shape tests, fix bugs

* Refactor first two output dynamic len calculation

* Conv MLIR test update

* i64 MLIR test fix

* Fix MLIR test typo
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

67f77ac1

29 Jul, 2022 1 commit

Avoid registering host buffer ptr multiple times during hip copies (#1245) · 7596f3f1

Umang Yadav authored Jul 29, 2022

Currently, while copying a host buffer to the device, it first registers/maps the host buffer pointer to address space of the device.

If the host buffer has been allocated by the hipHostMalloc then, it is implicitly registered to the device's address space, and no need to register again. This PR adds a check for the same.

7596f3f1