Commits · 2e337c7fc4eb42c76d548f74cfc7bb5a93740fde · gaoqiong / MIGraphX

09 Dec, 2021 1 commit

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

12 Feb, 2020 1 commit

Fix HIP-Clang GPU build issues (#384) · ba07b221

Aaron Enye Shi authored Feb 12, 2020



* Fix HIP-Clang GPU build issues

Add missing device attributes for GPU functions. GPU functions must be annotated with __device__ in HIP.

* Use HIP device function max and min

* Fix clang-format-5.0 issues

* Undo change that breaks on HIP-HCC
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

ba07b221

20 Dec, 2019 1 commit

Improve operators for onnxruntime (#405) · 992666e6

Shucai Xiao authored Dec 20, 2019



* improve unsqueeze to support negative axis and parsing scalar

* clang format

* add a test example for the negative axis of unsqueeze

* improve the squeeze operator to support negative axis

* clang format

* fixed a small bug in the lrn implementation

* clang format

* support negative axis in argmax and argmin

* clang format

* improve flatten to support negative axis

* clang format

* change softmax/logsoftmax to support negative axis

* clang format

* improve transpose by adding default perm

* clang format

* add one more dimens for tensor size

* add one more dimens for tensor size

* disable conv ops fusion for non-symmetric cases

* clang format

* fixed review comments

* move computing axis from the device function to the compute function

* clang format

* move computing axis from device function to the operator computing function

* clang format
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

992666e6

15 Nov, 2019 1 commit

Add option to do offload copying automatically (#403) · 81b0ff5d

Paul Fultz II authored Nov 15, 2019

* Add compiler options

* Add copy operators

* Formatting

* Use run_passes in tests

* Formatting

* Use run_pass in schedule test

* Formatting

* Add compile_options to get_passes in target

* Formatting

* Offload copy option

* Formatting

* Copy using pinned memory

* Formatting

* Improve performance of gpu copying

* Formatting

* Dont copy

* Formatting

* Always make an extra copy

* Formatting

* Remove unused write op

* Add missing include

* Remove copy_to_gpu function in python api

* Make offload copy disabled by default on C++

* Formatting

* Fix tidy issues

* Formatting

* Fix namespace

* Fix python tests

* Turn clang format off since its broken

* Fix compile error on gcc 5

* Remove commented code

81b0ff5d

15 Oct, 2019 1 commit
- Use 32-bit integers for index calculations on the gpu (#387) · 193a3b7c
  Paul Fultz II authored Oct 15, 2019
```
* use 32bit integers for indices

* Formatting

* Update more index types

* Formatting
```
  193a3b7c
28 Jun, 2019 2 commits
- clang format · a7a686d5
  Shucai Xiao authored Jun 28, 2019
  
  a7a686d5
- futher factor softmax/logsoftmax gpu implementation · 8ce6758a
  Shucai Xiao authored Jun 28, 2019
  
  8ce6758a
26 Jun, 2019 4 commits
- further code cleanup · 8817e238
  Shucai Xiao authored Jun 26, 2019
  
  8817e238
- code backup · 3e70d01b
  Shucai Xiao authored Jun 26, 2019
  
  3e70d01b
- clang format · 22500e6c
  Shucai Xiao authored Jun 25, 2019
  
  22500e6c
- add std namespace for size_t · ea932b63
  Shucai Xiao authored Jun 25, 2019
  
  ea932b63
25 Jun, 2019 12 commits
- minor changes of device function signature. · 4479fc3c
  Shucai Xiao authored Jun 25, 2019
  
  4479fc3c
- clang format · bfa455a1
  Shucai Xiao authored Jun 25, 2019
  
  bfa455a1
- more optimization of reduce operation. · 605cce41
  Shucai Xiao authored Jun 25, 2019
  
  605cce41
- clang format · 42b24bd1
  Shucai Xiao authored Jun 25, 2019
  
  42b24bd1
- fix build errors. · b9575730
  Shucai Xiao authored Jun 25, 2019
  
  b9575730
- code cleanup · 1a51797e
  Shucai Xiao authored Jun 25, 2019
  
  1a51797e
- clang format · b6786993
  Shucai Xiao authored Jun 25, 2019
  
  b6786993
- code refactor. · ccdacf44
  Shucai Xiao authored Jun 25, 2019
  
  ccdacf44
- clang format · 17a269a4
  Shucai Xiao authored Jun 24, 2019
  
  17a269a4
- code cleanup for softmax. · 63773ec0
  Shucai Xiao authored Jun 24, 2019
  
  63773ec0
- clang format · 6ae2f087
  Shucai Xiao authored Jun 24, 2019
  
  6ae2f087
- simplify the code for softmax gpu implementation. · aeb02070
  Shucai Xiao authored Jun 24, 2019
  
  aeb02070
24 Jun, 2019 4 commits
- clang format · ee877777
  Shucai Xiao authored Jun 24, 2019
  
  ee877777
- further refactoring of softmax and logsoftmax. · 8724242e
  Shucai Xiao authored Jun 24, 2019
  
  8724242e
- clang format · 6d1c23e9
  Shucai Xiao authored Jun 24, 2019
  
  6d1c23e9
- further optimization of the softmax and logsoftmax operator. · b8782a5f
  Shucai Xiao authored Jun 24, 2019
  
  b8782a5f
22 Jun, 2019 2 commits
- Formatting · 2220bd25
  Paul authored Jun 21, 2019
  
  2220bd25
- Refactor softmax · c9e391fe
  Paul authored Jun 21, 2019
  
  c9e391fe
21 Jun, 2019 1 commit
- optimize softmax gpu implementation. · b58ec6a8
  Shucai Xiao authored Jun 21, 2019
  
  b58ec6a8
30 May, 2019 2 commits
- clang format · 7f7cbbc0
  Shucai Xiao authored May 30, 2019
  
  7f7cbbc0
- change the gpu implementation of the softmax. · 88351f31
  Shucai Xiao authored May 30, 2019
  
  88351f31
24 May, 2019 2 commits
- formatting · fd74c021
  Khalique authored May 23, 2019
  
  fd74c021
- fix test cases, revert code · 4c1e707b
  Khalique authored May 23, 2019
  
  4c1e707b
23 May, 2019 2 commits
- formatting · 12a79223
  Khalique authored May 23, 2019
  
  12a79223
- initial testing for softmax · 2ebb3515
  Khalique authored May 23, 2019
  
  2ebb3515
26 Feb, 2019 2 commits
- clang format · 0b769919
  Shucai Xiao authored Feb 26, 2019
  
  0b769919
- add teh gpu impementation for the logsoftmax operator · 6dc749f3
  Shucai Xiao authored Feb 26, 2019
  
  6dc749f3