1. 07 Jul, 2022 1 commit
  2. 29 Apr, 2022 1 commit
  3. 27 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Add lane reduction (#1180) · 4c72cc95
      Paul Fultz II authored
      With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum:
      
      # lane
      gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms
      # block
      gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms
      # original
      gpu::reduce_sum[axes={1}]: 6.73456ms
      There is some basic logic to pick between lane and block reduce automatically.
      4c72cc95
  4. 26 Apr, 2022 1 commit
  5. 23 Apr, 2022 1 commit
    • Charlie Lin's avatar
      ReverseSequence op (#1177) · 31906785
      Charlie Lin authored
      Implements the ReverseSequence ONNX operator as a parser.
      
      This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell.
      We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator.
      The ONNX backend tests are disabled because this does not handle variable sequence_lens.
      31906785
  6. 19 Apr, 2022 1 commit
    • Charlie Lin's avatar
      Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152) · 764273e4
      Charlie Lin authored
      Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp.
      Removed cpu_pooling, instead using reference pooling in pooling.hpp
      Added reference implementation of Lp Norm pooling and the global version
      Added tests for the Lp Norm Pooling
      764273e4
  7. 17 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Reduce with runtime compilation (#1150) · f9a5b81e
      Paul Fultz II authored
      There is significant improvement on larger tensors with half almost 50% faster:
      
      lens: [1024, 384, 768]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
      gpu::reduce_sum[axes={2}]: 1.73126ms
      Also for non-trivial layouts this can sometimes be over 2x faster:
      
      lens: [64, 1024, 768, 4]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
      gpu::reduce_sum[axes={1}]: 2.63375ms
      Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.
      
      Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
      f9a5b81e
  8. 14 Apr, 2022 2 commits
    • bpickrel's avatar
      Half2 overloads (#1157) · 12007dba
      bpickrel authored
      Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type.
      
      Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases:
      
      tensor size not divisible by 2
      tensor size divisible by 2 but not by 4
      tensor size divisible by 4
      12007dba
    • kahmed10's avatar
      Fix file download for resnet50 example (#1164) · a930f1d5
      kahmed10 authored
      update path for where file is located
      a930f1d5
  9. 13 Apr, 2022 1 commit
  10. 12 Apr, 2022 2 commits
  11. 11 Apr, 2022 2 commits
    • bpickrel's avatar
      scatter operator refactoring to include reduction (#1124) · 701c2014
      bpickrel authored
      Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.)
      
      Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.
      701c2014
    • Shucai Xiao's avatar
      fix a bug in create tensor_view with vec data type (#1155) · 3c301efa
      Shucai Xiao authored
      When create a tensor_view with vector date type, the last dimension of the shape should be divided by the vec_size.
      3c301efa
  12. 08 Apr, 2022 1 commit
  13. 06 Apr, 2022 1 commit
  14. 01 Apr, 2022 1 commit
    • Charlie Lin's avatar
      Update developer overview, fix doc CMakeLists (#1140) · 0295965d
      Charlie Lin authored
      * Fix and change doc CMakeLists
      1. Fix include directory location with hange from #1088
      2. Create a DoxygenWarningLog.txt file in <build_dir>/doc/doxygen
      3. Move compiled html or pdf files to <build_dir>/doc/[pdf, html]
      0295965d
  15. 31 Mar, 2022 1 commit
  16. 29 Mar, 2022 3 commits
  17. 28 Mar, 2022 2 commits
    • Paul Fultz II's avatar
      Use ifdef instead of comment for the auto-generated method declarations for... · 8e4d622f
      Paul Fultz II authored
      Use ifdef instead of comment for the auto-generated method declarations for type erased classes (#1138)
      
      It seems the formatting of comments are unreadable for larger methods, so instead just generate a struct with the methods in the interface and add a comment if its optional. It wraps this in #ifdef TYPE_ERASED_DECLARATION(assuming this would never be defined) instead of #if 0, so most editors can still provide syntax highlighting(although I think vscode with clangd will still gray it out unfortunately).
      8e4d622f
    • Paul Fultz II's avatar
      Use ccache for runtime compilation (#1131) · ad056b1f
      Paul Fultz II authored
      * Use ccache for runtime compilation
      ad056b1f
  18. 25 Mar, 2022 1 commit
  19. 24 Mar, 2022 1 commit
  20. 22 Mar, 2022 1 commit
  21. 21 Mar, 2022 1 commit
  22. 18 Mar, 2022 2 commits
  23. 15 Mar, 2022 2 commits
    • Umang Yadav's avatar
      Expose APIs for the MIGraphX program (#1093) · 64e79a94
      Umang Yadav authored
      API includes following
      create_module,
      get_main_module
      add_instruction without module args
      add_instruction with module args
      add_parameter
      add_return
      64e79a94
    • Paul Fultz II's avatar
      Add iterators to kernels tensor_view and fix roialign to work with non-standard shape (#1126) · 31e63991
      Paul Fultz II authored
      This adds iterators to tensor_view, which can allow kernels to work with non-standard shapes like for roialign.
      
      To improve the performance of indexing when using the iterators, the shape class was updated to use integral_constants since the compiler doesn't always fold the const values. An integral_constant will at least enforce that in the AST.
      
      Finally, since index calculations with single integers are improved, I also updated pointwise to use single index rather than multi index. There is about 4% improvement in some cases.
      31e63991
  24. 14 Mar, 2022 3 commits
  25. 11 Mar, 2022 1 commit
    • Shucai Xiao's avatar
      Improve print ins (#1096) · b3b44f5d
      Shucai Xiao authored
      The module::debug_print(ins) is very slow, which makes the trave_eval==1/2 very slow. The reason is printing an ins involves search the whole module to get the instruction, the print it.  This change is to fix that by calling module::print() to get names of all instructions of a program, then print the instruction by getting its name from a hash map.
      b3b44f5d
  26. 09 Mar, 2022 3 commits
  27. 08 Mar, 2022 1 commit
  28. 07 Mar, 2022 1 commit