- 10 Mar, 2022 2 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 08 Mar, 2022 3 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 07 Mar, 2022 2 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 08 Feb, 2022 2 commits
-
-
Khalique Ahmed authored
-
Khalique Ahmed authored
-
- 31 Jan, 2022 1 commit
-
-
Khalique Ahmed authored
-
- 09 Dec, 2021 1 commit
-
-
Shucai Xiao authored
Changed the number of threads in a block from 256 to 128 Increased the max number of blocks in the kernel from 256 to 1M. For the case that the axis is the last dimension, we removed the computation of index since it is not required. With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
-
- 27 Apr, 2021 1 commit
-
-
Khalique Ahmed authored
-
- 12 Feb, 2020 1 commit
-
-
Aaron Enye Shi authored
* Fix HIP-Clang GPU build issues Add missing device attributes for GPU functions. GPU functions must be annotated with __device__ in HIP. * Use HIP device function max and min * Fix clang-format-5.0 issues * Undo change that breaks on HIP-HCC Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 20 Dec, 2019 1 commit
-
-
Shucai Xiao authored
* improve unsqueeze to support negative axis and parsing scalar * clang format * add a test example for the negative axis of unsqueeze * improve the squeeze operator to support negative axis * clang format * fixed a small bug in the lrn implementation * clang format * support negative axis in argmax and argmin * clang format * improve flatten to support negative axis * clang format * change softmax/logsoftmax to support negative axis * clang format * improve transpose by adding default perm * clang format * add one more dimens for tensor size * add one more dimens for tensor size * disable conv ops fusion for non-symmetric cases * clang format * fixed review comments * move computing axis from the device function to the compute function * clang format * move computing axis from device function to the operator computing function * clang format Co-authored-by:mvermeulen <5479696+mvermeulen@users.noreply.github.com>
-
- 15 Nov, 2019 1 commit
-
-
Paul Fultz II authored
* Add compiler options * Add copy operators * Formatting * Use run_passes in tests * Formatting * Use run_pass in schedule test * Formatting * Add compile_options to get_passes in target * Formatting * Offload copy option * Formatting * Copy using pinned memory * Formatting * Improve performance of gpu copying * Formatting * Dont copy * Formatting * Always make an extra copy * Formatting * Remove unused write op * Add missing include * Remove copy_to_gpu function in python api * Make offload copy disabled by default on C++ * Formatting * Fix tidy issues * Formatting * Fix namespace * Fix python tests * Turn clang format off since its broken * Fix compile error on gcc 5 * Remove commented code
-
- 15 Oct, 2019 1 commit
-
-
Paul Fultz II authored
* use 32bit integers for indices * Formatting * Update more index types * Formatting
-
- 28 Jun, 2019 2 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 26 Jun, 2019 4 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 25 Jun, 2019 12 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 24 Jun, 2019 4 commits
-
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
Shucai Xiao authored
-
- 22 Jun, 2019 2 commits