1. 29 Apr, 2022 1 commit
  2. 26 Apr, 2022 1 commit
  3. 23 Apr, 2022 1 commit
    • Charlie Lin's avatar
      ReverseSequence op (#1177) · 31906785
      Charlie Lin authored
      Implements the ReverseSequence ONNX operator as a parser.
      
      This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell.
      We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator.
      The ONNX backend tests are disabled because this does not handle variable sequence_lens.
      31906785
  4. 19 Apr, 2022 1 commit
    • Charlie Lin's avatar
      Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152) · 764273e4
      Charlie Lin authored
      Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp.
      Removed cpu_pooling, instead using reference pooling in pooling.hpp
      Added reference implementation of Lp Norm pooling and the global version
      Added tests for the Lp Norm Pooling
      764273e4
  5. 17 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Reduce with runtime compilation (#1150) · f9a5b81e
      Paul Fultz II authored
      There is significant improvement on larger tensors with half almost 50% faster:
      
      lens: [1024, 384, 768]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
      gpu::reduce_sum[axes={2}]: 1.73126ms
      Also for non-trivial layouts this can sometimes be over 2x faster:
      
      lens: [64, 1024, 768, 4]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
      gpu::reduce_sum[axes={1}]: 2.63375ms
      Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.
      
      Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
      f9a5b81e
  6. 14 Apr, 2022 1 commit
    • bpickrel's avatar
      Half2 overloads (#1157) · 12007dba
      bpickrel authored
      Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type.
      
      Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases:
      
      tensor size not divisible by 2
      tensor size divisible by 2 but not by 4
      tensor size divisible by 4
      12007dba
  7. 11 Apr, 2022 1 commit
    • bpickrel's avatar
      scatter operator refactoring to include reduction (#1124) · 701c2014
      bpickrel authored
      Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.)
      
      Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.
      701c2014
  8. 08 Apr, 2022 1 commit
  9. 06 Apr, 2022 1 commit
  10. 01 Apr, 2022 1 commit
    • Charlie Lin's avatar
      Update developer overview, fix doc CMakeLists (#1140) · 0295965d
      Charlie Lin authored
      * Fix and change doc CMakeLists
      1. Fix include directory location with hange from #1088
      2. Create a DoxygenWarningLog.txt file in <build_dir>/doc/doxygen
      3. Move compiled html or pdf files to <build_dir>/doc/[pdf, html]
      0295965d
  11. 31 Mar, 2022 1 commit
  12. 29 Mar, 2022 2 commits
  13. 25 Mar, 2022 1 commit
  14. 24 Mar, 2022 1 commit
  15. 21 Mar, 2022 1 commit
  16. 18 Mar, 2022 2 commits
  17. 15 Mar, 2022 1 commit
  18. 11 Mar, 2022 1 commit
    • Shucai Xiao's avatar
      Improve print ins (#1096) · b3b44f5d
      Shucai Xiao authored
      The module::debug_print(ins) is very slow, which makes the trave_eval==1/2 very slow. The reason is printing an ins involves search the whole module to get the instruction, the print it.  This change is to fix that by calling module::print() to get names of all instructions of a program, then print the instruction by getting its name from a hash map.
      b3b44f5d
  19. 09 Mar, 2022 3 commits
  20. 08 Mar, 2022 1 commit
  21. 07 Mar, 2022 1 commit
  22. 04 Mar, 2022 2 commits
  23. 03 Mar, 2022 1 commit
  24. 02 Mar, 2022 2 commits
  25. 25 Feb, 2022 3 commits
  26. 24 Feb, 2022 1 commit
    • Paul Fultz II's avatar
      Some cmake fixes and updates (#1088) · cd0a4aa5
      Paul Fultz II authored
      Make doc/CMakeLists.txt standalone
      Switch to use rocm-cmake modules for document generation
      Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run
      Add STRINGS property for build type to make it easier to switch build types with ccmake
      Various fixes and improvements
      cd0a4aa5
  27. 23 Feb, 2022 1 commit
    • Shucai Xiao's avatar
      Keep std shape (#1059) · 98dfdf15
      Shucai Xiao authored
      This PR is the resolve two problems in the issue#999, i.e., non_standard_shape input to reshape and reduce_mean.
      Three fixes:
      
      Any operator that has a standard shape requirement will add a contiguous input for its input.
      Eliminate_contiguous, when computing whether a contiguous can be removed, we should use all the updated args, not just the one that is being checked.
      In two optimization in the simplify_reshape, we remove the contiguous in the reshaper name list, since eliminate_contiguous will remove the contiguous if it can be removed.
      the solution is add an attribute to the operator that requires standard input shape, then in the auto_contiguous pass, add a contiguous to every input of such operators.
      98dfdf15
  28. 16 Feb, 2022 2 commits
  29. 11 Feb, 2022 1 commit
  30. 09 Feb, 2022 2 commits