Commits · 9c46821c768e04d9b80cd9ba8dacb923b962cd9e · gaoqiong / MIGraphX

22 Nov, 2023 1 commit
- Use double buffer for block_scan (#2436) · ee257d99
  Paul Fultz II authored Nov 22, 2023
  
  ee257d99
17 Nov, 2023 1 commit

Ref implementation of FP8 (#2438) · 7f93a818

Umang Yadav authored Nov 17, 2023

Handles all 4 Fp8 dtypes listed here : https://onnx.ai/onnx/technical/float8.html
Follows saturation/clipping logic from table there as well : https://onnx.ai/onnx/technical/float8.html#cast
Only adding fp8e4m3fnuz in MIGraphX IR for now.

7f93a818

30 Oct, 2023 1 commit
- Remove int8x4 format completely (#2373) · 22bb777f
  Umang Yadav authored Oct 30, 2023
  
  22bb777f
20 Oct, 2023 1 commit
- Add support for select_last_index attribute for ArgMax & ArgMin (#2235) · 6ae4227a
  Zakor Gyula authored Oct 20, 2023
  
  6ae4227a
06 Oct, 2023 1 commit
- add missing DLL symbols exports (#2281) · 1082f667
  Artur Wojcik authored Oct 07, 2023
  
  1082f667
03 Oct, 2023 1 commit
- just use one flush call (#2272) · 36eaf9e5
  Umang Yadav authored Oct 02, 2023
  
  36eaf9e5
16 Sep, 2023 1 commit
- Improve error message when launching kernels fails (#2076) · 707602d7
  Paul Fultz II authored Sep 16, 2023
```
let the user know which targets migraphx was built for and how to build migraphx for their gpu.
```
  707602d7
13 Aug, 2023 1 commit
- Check for device kernel launch error message (#2066) · 1354c869
  Umang Yadav authored Aug 13, 2023
  
  1354c869
08 Aug, 2023 1 commit
- Update to Cppcheck 2.11 (#1914) · a359d2c8
  Paul Fultz II authored Aug 08, 2023
  
  a359d2c8
08 Jun, 2023 1 commit
- Add initial CK integration plus auto-tuning for kernels (#1791) · 25af8710
  Paul Fultz II authored Jun 08, 2023
```
Enable with MIGRAPHX_ENABLE_CK=1 and --exhaustive-tune tune flag
```
  25af8710
19 May, 2023 1 commit
- Enabling native int32 type support (#1721) · 8d9d5d1c
  Zhuoran Yin authored May 19, 2023
```
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
```
  8d9d5d1c
24 Apr, 2023 1 commit

Dynamic shape hip::copy_to_gpu and hip::copy_from_gpu (#1694) · 84acaea0

Charlie Lin authored Apr 24, 2023

Updates the hip::copy_to_gpu and hip::copy_from_gpu operators to work with dynamic shapes

Allows for offload_copy to be used with dynamic batch

Changed assert in select_module because the argument might now be smaller with how offload_copy will work with dynamic batch. (maximum buffer size will be used)

84acaea0

16 Feb, 2023 1 commit
- Remove HCC (#1546) · bfd77388
  Umang Yadav authored Feb 16, 2023
```
* deprecate HCC
```
  bfd77388
31 Jan, 2023 1 commit

hipRTC fixes (#1531) · 91cc7242

Umang Yadav authored Jan 31, 2023

Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC.
Added stages in Jenkins for hipRTC.
Fixes for some of the pending issues from hipRTC.

91cc7242

23 Sep, 2022 1 commit
- Remove unused device functions (#1394) · 8ea8473d
  Paul Fultz II authored Sep 23, 2022
```
* Remove device functions
* Update tests
```
  8ea8473d
06 Sep, 2022 1 commit
- Enable cppcheck rule for 'not', 'or' keywords (#1361) · d37a4df9
  Paul Fultz II authored Sep 06, 2022
```
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
```
  d37a4df9
22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
11 Apr, 2022 1 commit

fix a bug in create tensor_view with vec data type (#1155) · 3c301efa

Shucai Xiao authored Apr 11, 2022

When create a tensor_view with vector date type, the last dimension of the shape should be divided by the vec_size.

3c301efa

18 Mar, 2022 1 commit

Complete GPU implementation of CumSum op (#1094) · 548783c8

turneram authored Mar 18, 2022

Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum

548783c8

14 Mar, 2022 1 commit
- Increase max groups in kernel (#1120) · d353641d
  Shucai Xiao authored Mar 14, 2022
```
change max number of groups in a kernel to 1B for greater performance
```
  d353641d
02 Mar, 2022 1 commit
- Clang format ver10 (#1106) · 9852aaef
  bpickrel authored Mar 02, 2022
```
Update the base version of clang-format from 5.0 to 10.0
```
  9852aaef
09 Dec, 2021 1 commit

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

08 Oct, 2021 1 commit

Nonzero op extension (#870) · 0879b5f1

Shucai Xiao authored Oct 08, 2021

This PR is for the nonzero operator with static output shape.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0879b5f1

01 Oct, 2021 1 commit

Add multinomial op (#954) · 0b7672d7

turneram authored Oct 01, 2021

Add multinomial op to onnx parser with ref and GPU implementations.

The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution.

Resolves #821
Co-authored-by: Shucai Xiao <shucai@gmail.com>

0b7672d7

27 Sep, 2021 1 commit

Dpp opts for wavefront 32 (#951) · 6e2df9de

kahmed10 authored Sep 27, 2021

Checks wavefront size, then changes implementation and number of threads for DPP reduce

6e2df9de

16 Sep, 2021 1 commit

Loop operator (#853) · a275f590

Shucai Xiao authored Sep 16, 2021

Add Loop operator for opset version 13.
Notes: 1) Default max iteration number is 10 if no max iteration number is provided
2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model.
3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later.
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

a275f590

02 Sep, 2021 2 commits

Refactor where op (#918) · ebbaf8fc

turneram authored Sep 02, 2021

Implement the Where operator for the CPU and GPU.  This is for better performance.

ebbaf8fc

Topk op (#877) · 521b57a2

Shucai Xiao authored Sep 01, 2021



* add topk operator doe ref, cpu and gpu
* Hash modules for quicker lookup of modules
* add onnx unit test
* add unit tests for the topk operator
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

521b57a2

09 Aug, 2021 1 commit
- check for divisor encodable or not, fallback if needed (#906) · a8d86615
  Cagri Eryilmaz authored Aug 09, 2021
```
* check for divisor encodable or not, fallback if needed

* verify test for retinaface case
```
  a8d86615
09 Jul, 2021 1 commit
- fix review comments · 4e861c7c
  Shucai Xiao authored Jul 08, 2021
  
  4e861c7c
08 Jul, 2021 3 commits

Add inclusive scan on the GPU (#872) · 6ba279cc

Paul Fultz II authored Jul 08, 2021



* Add initial scan operator

* Formatting

* Fix with a working test

* Fix bugs

* Formatting

* Formatting

* Simplify

* Formatting

* Use non-power of 2 for test

* Make pointer
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

6ba279cc

clang format · 4a24a2dd
Shucai Xiao authored Jul 08, 2021

4a24a2dd
fix review comments · 0281a411
Shucai Xiao authored Jul 08, 2021

0281a411

25 Jun, 2021 4 commits
- clang format · 27c0ae08
  Shucai Xiao authored Jun 25, 2021
  
  27c0ae08
- refine unit tests · dd651742
  Shucai Xiao authored Jun 25, 2021
  
  dd651742
- clang format · 7135da72
  Shucai Xiao authored Jun 25, 2021
  
  7135da72
- gpu implementation of the scatter operator · 973cafd4
  Shucai Xiao authored Jun 25, 2021
  
  973cafd4
11 Jun, 2021 1 commit
- Disable dpp for wavefront 32 (#857) · 764b1e44
  Paul Fultz II authored Jun 11, 2021
```
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>
```
  764b1e44
08 Jun, 2021 1 commit

Reverse Op (#846) · 9c54fc4f

Cagri Eryilmaz authored Jun 08, 2021



* init reverseOp branch: ref op + ref test. WIP

* first passing basic test

* cleanup

* additional axis implementation

* additional test

* ref op implementation vec to int for axis

* ref op test change for axis

* initial gpu files and test

* updates to implementation and test

* fixed some issues

* clang format

* cleanup

* formatting

* removing comments

* remove local size, back to default

* update tests: replace with std functions

* multiple axis for reverse op

* fix a build error

* clang format

* more tests

* fix a bug for the reverse device function

* clang format

* fix a bug

* clang format

* ref test updates, multiaxis

* formatting
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

9c54fc4f

03 May, 2021 1 commit

Reduce types generated for hip kernels (#814) · 3becd974

Paul Fultz II authored May 03, 2021



* Remove unused data types

* Formatting

* Reduce types generated for hip kernels

* Formatting

* Fix onnx tests

* Formatting
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

3becd974