1. 01 Mar, 2022 3 commits
  2. 28 Feb, 2022 8 commits
  3. 26 Feb, 2022 2 commits
  4. 25 Feb, 2022 3 commits
  5. 24 Feb, 2022 1 commit
    • Paul Fultz II's avatar
      Some cmake fixes and updates (#1088) · cd0a4aa5
      Paul Fultz II authored
      Make doc/CMakeLists.txt standalone
      Switch to use rocm-cmake modules for document generation
      Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run
      Add STRINGS property for build type to make it easier to switch build types with ccmake
      Various fixes and improvements
      cd0a4aa5
  6. 09 Feb, 2022 5 commits
  7. 08 Feb, 2022 5 commits
  8. 04 Feb, 2022 2 commits
  9. 31 Jan, 2022 1 commit
  10. 28 Jan, 2022 1 commit
  11. 27 Jan, 2022 1 commit
  12. 21 Jan, 2022 1 commit
  13. 10 Jan, 2022 1 commit
  14. 09 Dec, 2021 1 commit
    • Shucai Xiao's avatar
      Softmax perf optimization (#1014) · 2e337c7f
      Shucai Xiao authored
      Changed the number of threads in a block from 256 to 128
      Increased the max number of blocks in the kernel from 256 to 1M.
      For the case that the axis is the last dimension, we removed the computation of index since it is not required.
      
      With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
      2e337c7f
  15. 08 Dec, 2021 1 commit
  16. 07 Dec, 2021 1 commit
  17. 02 Dec, 2021 1 commit
  18. 30 Nov, 2021 2 commits