Commits · ec205c542be00497f4503fb8d663699400dc61ce · gaoqiong / MIGraphX

10 Mar, 2022 2 commits
- clang format · ec205c54
  Shucai Xiao authored Mar 09, 2022
  
  ec205c54
- backup softmax changes · 1da02b0f
  Shucai Xiao authored Mar 09, 2022
  
  1da02b0f
08 Mar, 2022 3 commits
- final version of softmax that works · 9f06859b
  Shucai Xiao authored Mar 08, 2022
  
  9f06859b
- version that softmax half2 works · bc9eac75
  Shucai Xiao authored Mar 08, 2022
  
  bc9eac75
- fix bugs in softmax half2 implementation · 23a18b2b
  Shucai Xiao authored Mar 08, 2022
  
  23a18b2b
07 Mar, 2022 2 commits
- clang format · 37f63907
  Shucai Xiao authored Mar 07, 2022
  
  37f63907
- backup code changes related to softmax · 45da3115
  Shucai Xiao authored Mar 07, 2022
  
  45da3115
08 Feb, 2022 2 commits
- formatting · 96c82f21
  Khalique Ahmed authored Feb 07, 2022
  
  96c82f21
- use other device name function · cb965031
  Khalique Ahmed authored Feb 07, 2022
  
  cb965031
31 Jan, 2022 1 commit
- formatting · 8d21ccdf
  Khalique Ahmed authored Jan 31, 2022
  
  8d21ccdf
09 Dec, 2021 1 commit

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

27 Apr, 2021 1 commit
- change softmax block size · 32b69ceb
  Khalique Ahmed authored Apr 26, 2021
  
  32b69ceb
12 Feb, 2020 1 commit

Fix HIP-Clang GPU build issues (#384) · ba07b221

Aaron Enye Shi authored Feb 12, 2020



* Fix HIP-Clang GPU build issues

Add missing device attributes for GPU functions. GPU functions must be annotated with __device__ in HIP.

* Use HIP device function max and min

* Fix clang-format-5.0 issues

* Undo change that breaks on HIP-HCC
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

ba07b221

20 Dec, 2019 1 commit

Improve operators for onnxruntime (#405) · 992666e6

Shucai Xiao authored Dec 20, 2019



* improve unsqueeze to support negative axis and parsing scalar

* clang format

* add a test example for the negative axis of unsqueeze

* improve the squeeze operator to support negative axis

* clang format

* fixed a small bug in the lrn implementation

* clang format

* support negative axis in argmax and argmin

* clang format

* improve flatten to support negative axis

* clang format

* change softmax/logsoftmax to support negative axis

* clang format

* improve transpose by adding default perm

* clang format

* add one more dimens for tensor size

* add one more dimens for tensor size

* disable conv ops fusion for non-symmetric cases

* clang format

* fixed review comments

* move computing axis from the device function to the compute function

* clang format

* move computing axis from device function to the operator computing function

* clang format
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

992666e6

15 Nov, 2019 1 commit

Add option to do offload copying automatically (#403) · 81b0ff5d

Paul Fultz II authored Nov 15, 2019

* Add compiler options

* Add copy operators

* Formatting

* Use run_passes in tests

* Formatting

* Use run_pass in schedule test

* Formatting

* Add compile_options to get_passes in target

* Formatting

* Offload copy option

* Formatting

* Copy using pinned memory

* Formatting

* Improve performance of gpu copying

* Formatting

* Dont copy

* Formatting

* Always make an extra copy

* Formatting

* Remove unused write op

* Add missing include

* Remove copy_to_gpu function in python api

* Make offload copy disabled by default on C++

* Formatting

* Fix tidy issues

* Formatting

* Fix namespace

* Fix python tests

* Turn clang format off since its broken

* Fix compile error on gcc 5

* Remove commented code

81b0ff5d

15 Oct, 2019 1 commit
- Use 32-bit integers for index calculations on the gpu (#387) · 193a3b7c
  Paul Fultz II authored Oct 15, 2019
```
* use 32bit integers for indices

* Formatting

* Update more index types

* Formatting
```
  193a3b7c
28 Jun, 2019 2 commits
- clang format · a7a686d5
  Shucai Xiao authored Jun 28, 2019
  
  a7a686d5
- futher factor softmax/logsoftmax gpu implementation · 8ce6758a
  Shucai Xiao authored Jun 28, 2019
  
  8ce6758a
26 Jun, 2019 4 commits
- further code cleanup · 8817e238
  Shucai Xiao authored Jun 26, 2019
  
  8817e238
- code backup · 3e70d01b
  Shucai Xiao authored Jun 26, 2019
  
  3e70d01b
- clang format · 22500e6c
  Shucai Xiao authored Jun 25, 2019
  
  22500e6c
- add std namespace for size_t · ea932b63
  Shucai Xiao authored Jun 25, 2019
  
  ea932b63
25 Jun, 2019 12 commits
- minor changes of device function signature. · 4479fc3c
  Shucai Xiao authored Jun 25, 2019
  
  4479fc3c
- clang format · bfa455a1
  Shucai Xiao authored Jun 25, 2019
  
  bfa455a1
- more optimization of reduce operation. · 605cce41
  Shucai Xiao authored Jun 25, 2019
  
  605cce41
- clang format · 42b24bd1
  Shucai Xiao authored Jun 25, 2019
  
  42b24bd1
- fix build errors. · b9575730
  Shucai Xiao authored Jun 25, 2019
  
  b9575730
- code cleanup · 1a51797e
  Shucai Xiao authored Jun 25, 2019
  
  1a51797e
- clang format · b6786993
  Shucai Xiao authored Jun 25, 2019
  
  b6786993
- code refactor. · ccdacf44
  Shucai Xiao authored Jun 25, 2019
  
  ccdacf44
- clang format · 17a269a4
  Shucai Xiao authored Jun 24, 2019
  
  17a269a4
- code cleanup for softmax. · 63773ec0
  Shucai Xiao authored Jun 24, 2019
  
  63773ec0
- clang format · 6ae2f087
  Shucai Xiao authored Jun 24, 2019
  
  6ae2f087
- simplify the code for softmax gpu implementation. · aeb02070
  Shucai Xiao authored Jun 24, 2019
  
  aeb02070
24 Jun, 2019 4 commits
- clang format · ee877777
  Shucai Xiao authored Jun 24, 2019
  
  ee877777
- further refactoring of softmax and logsoftmax. · 8724242e
  Shucai Xiao authored Jun 24, 2019
  
  8724242e
- clang format · 6d1c23e9
  Shucai Xiao authored Jun 24, 2019
  
  6d1c23e9
- further optimization of the softmax and logsoftmax operator. · b8782a5f
  Shucai Xiao authored Jun 24, 2019
  
  b8782a5f
22 Jun, 2019 2 commits
- Formatting · 2220bd25
  Paul authored Jun 21, 2019
  
  2220bd25
- Refactor softmax · c9e391fe
  Paul authored Jun 21, 2019
  
  c9e391fe