1. 25 Jun, 2022 2 commits
  2. 23 Jun, 2022 1 commit
  3. 22 Jun, 2022 1 commit
  4. 20 Jun, 2022 1 commit
  5. 17 Jun, 2022 2 commits
    • Umang Yadav's avatar
      Update lowering of Dot operator (#1247) · c99be32c
      Umang Yadav authored
      
      
      * remove code for allocation of C param in dot lowering
      
      * formatting
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      c99be32c
    • kahmed10's avatar
      Create allocate op and replace_allocate pass (#1183) · add6fb3b
      kahmed10 authored
      
      
      * add allocate op header
      
      * formatting
      
      * add replace_allocate pass
      
      * formatting
      
      * move output param to remove_allocate pass
      
      * formatting
      
      * fix bugs in replace_allocate pass
      
      * formatting
      
      * fix verify if tests
      
      * formatting
      
      * move if op logic
      
      * formatting
      
      * cleanup lowering
      
      * cleanup lowering
      
      * formatting
      
      * fix tidy
      
      * formatting
      
      * fix tidy
      
      * add cpu allocate check
      
      * formatting
      
      * change cpu allocate in pass
      
      * formatting
      
      * add some tests for replace_allocate pass
      
      * formatting
      
      * pass by ref
      
      * fix run_pass
      
      * formatting
      
      * update variable name for module
      
      * update dce to use contains() and fix tidy
      
      * formatting
      
      * update cppcheck
      
      * add if test
      
      * formatting
      
      * add if test
      
      * rename var to mod_output_names
      
      * formatting
      
      * remove conditional
      
      * update allocate op and tests
      
      * formatting
      
      * update replace_allocate tests
      
      * update create_output_names() and conditional in replace_allocate
      
      * formatting
      
      * remove extra variable in replace_allocate
      
      * update tools script for allocation_model
      Co-authored-by: default avatarUmang Yadav <29876643+umangyadav@users.noreply.github.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      add6fb3b
  6. 10 Jun, 2022 1 commit
  7. 07 Jun, 2022 1 commit
  8. 03 Jun, 2022 1 commit
    • Paul Fultz II's avatar
      Group code objects by kernel name in perf report summary (#1234) · 7271ddbc
      Paul Fultz II authored
      Break up the gpu::code_object  print to show the actual kernels...
      
      gpu::code_object::add_kernel: 0.646121ms, 5%
      gpu::code_object::mul_kernel: 0.623822ms, 5%
      gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
      gpu::code_object::mul_add_kernel: 0.478352ms, 4%
      7271ddbc
  9. 02 Jun, 2022 1 commit
  10. 26 May, 2022 1 commit
  11. 24 May, 2022 3 commits
  12. 20 May, 2022 1 commit
    • kahmed10's avatar
      Rename pointwise ops (#1145) · 4a312201
      kahmed10 authored
      For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
      4a312201
  13. 17 May, 2022 1 commit
  14. 11 May, 2022 1 commit
  15. 09 May, 2022 1 commit
  16. 06 May, 2022 1 commit
  17. 05 May, 2022 1 commit
    • Paul Fultz II's avatar
      Cppcheck fixes (#1195) · d582425b
      Paul Fultz II authored
      Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.
      d582425b
  18. 29 Apr, 2022 1 commit
  19. 27 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Add lane reduction (#1180) · 4c72cc95
      Paul Fultz II authored
      With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum:
      
      # lane
      gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms
      # block
      gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms
      # original
      gpu::reduce_sum[axes={1}]: 6.73456ms
      There is some basic logic to pick between lane and block reduce automatically.
      4c72cc95
  20. 17 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Reduce with runtime compilation (#1150) · f9a5b81e
      Paul Fultz II authored
      There is significant improvement on larger tensors with half almost 50% faster:
      
      lens: [1024, 384, 768]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
      gpu::reduce_sum[axes={2}]: 1.73126ms
      Also for non-trivial layouts this can sometimes be over 2x faster:
      
      lens: [64, 1024, 768, 4]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
      gpu::reduce_sum[axes={1}]: 2.63375ms
      Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.
      
      Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
      f9a5b81e
  21. 14 Apr, 2022 1 commit
    • bpickrel's avatar
      Half2 overloads (#1157) · 12007dba
      bpickrel authored
      Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type.
      
      Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases:
      
      tensor size not divisible by 2
      tensor size divisible by 2 but not by 4
      tensor size divisible by 4
      12007dba
  22. 12 Apr, 2022 1 commit
  23. 11 Apr, 2022 2 commits
    • bpickrel's avatar
      scatter operator refactoring to include reduction (#1124) · 701c2014
      bpickrel authored
      Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.)
      
      Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.
      701c2014
    • Shucai Xiao's avatar
      fix a bug in create tensor_view with vec data type (#1155) · 3c301efa
      Shucai Xiao authored
      When create a tensor_view with vector date type, the last dimension of the shape should be divided by the vec_size.
      3c301efa
  24. 29 Mar, 2022 1 commit
    • Paul Fultz II's avatar
      Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6
      Paul Fultz II authored
      This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.
      
      This also makes it easier to add new runtime compiled kernels in the future.
      661046c6
  25. 28 Mar, 2022 1 commit
  26. 18 Mar, 2022 1 commit
  27. 15 Mar, 2022 1 commit
    • Paul Fultz II's avatar
      Add iterators to kernels tensor_view and fix roialign to work with non-standard shape (#1126) · 31e63991
      Paul Fultz II authored
      This adds iterators to tensor_view, which can allow kernels to work with non-standard shapes like for roialign.
      
      To improve the performance of indexing when using the iterators, the shape class was updated to use integral_constants since the compiler doesn't always fold the const values. An integral_constant will at least enforce that in the AST.
      
      Finally, since index calculations with single integers are improved, I also updated pointwise to use single index rather than multi index. There is about 4% improvement in some cases.
      31e63991
  28. 14 Mar, 2022 1 commit
  29. 04 Mar, 2022 1 commit
    • bpickrel's avatar
      Mode as enum for pooling and roi_align (#1091) · a2e90b5d
      bpickrel authored
      Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.
      a2e90b5d
  30. 03 Mar, 2022 3 commits
  31. 02 Mar, 2022 2 commits
  32. 25 Feb, 2022 1 commit