1. 29 Mar, 2022 1 commit
    • Paul Fultz II's avatar
      Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6
      Paul Fultz II authored
      This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.
      
      This also makes it easier to add new runtime compiled kernels in the future.
      661046c6
  2. 28 Mar, 2022 1 commit
  3. 18 Mar, 2022 1 commit
  4. 15 Mar, 2022 1 commit
    • Paul Fultz II's avatar
      Add iterators to kernels tensor_view and fix roialign to work with non-standard shape (#1126) · 31e63991
      Paul Fultz II authored
      This adds iterators to tensor_view, which can allow kernels to work with non-standard shapes like for roialign.
      
      To improve the performance of indexing when using the iterators, the shape class was updated to use integral_constants since the compiler doesn't always fold the const values. An integral_constant will at least enforce that in the AST.
      
      Finally, since index calculations with single integers are improved, I also updated pointwise to use single index rather than multi index. There is about 4% improvement in some cases.
      31e63991
  5. 14 Mar, 2022 1 commit
  6. 04 Mar, 2022 1 commit
    • bpickrel's avatar
      Mode as enum for pooling and roi_align (#1091) · a2e90b5d
      bpickrel authored
      Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.
      a2e90b5d
  7. 03 Mar, 2022 3 commits
  8. 02 Mar, 2022 2 commits
  9. 25 Feb, 2022 1 commit
  10. 24 Feb, 2022 1 commit
    • Paul Fultz II's avatar
      Some cmake fixes and updates (#1088) · cd0a4aa5
      Paul Fultz II authored
      Make doc/CMakeLists.txt standalone
      Switch to use rocm-cmake modules for document generation
      Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run
      Add STRINGS property for build type to make it easier to switch build types with ccmake
      Various fixes and improvements
      cd0a4aa5
  11. 21 Feb, 2022 2 commits
  12. 09 Feb, 2022 1 commit
  13. 08 Feb, 2022 2 commits
  14. 07 Feb, 2022 2 commits
  15. 05 Feb, 2022 2 commits
  16. 28 Jan, 2022 1 commit
  17. 27 Jan, 2022 1 commit
  18. 26 Jan, 2022 1 commit
    • Paul's avatar
      Updates · 1cc6c88c
      Paul authored
      1cc6c88c
  19. 21 Jan, 2022 1 commit
  20. 10 Jan, 2022 3 commits
  21. 07 Jan, 2022 2 commits
  22. 06 Jan, 2022 3 commits
  23. 11 Dec, 2021 5 commits
  24. 09 Dec, 2021 1 commit
    • Shucai Xiao's avatar
      Softmax perf optimization (#1014) · 2e337c7f
      Shucai Xiao authored
      Changed the number of threads in a block from 256 to 128
      Increased the max number of blocks in the kernel from 256 to 1M.
      For the case that the axis is the last dimension, we removed the computation of index since it is not required.
      
      With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
      2e337c7f