1. 01 Jun, 2022 3 commits
  2. 31 May, 2022 3 commits
  3. 30 May, 2022 1 commit
    • shivadbhavsar's avatar
      Improve eliminate contiguous pass (#1223) · 86061b4d
      shivadbhavsar authored
      Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.
      86061b4d
  4. 27 May, 2022 1 commit
  5. 26 May, 2022 2 commits
    • shivadbhavsar's avatar
      Parallelize evaluations in propagate_constant (#1220) · bf603a76
      shivadbhavsar authored
      Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call.
      
      New approach:
      
      Perform single pass though instructions in the module to determine which instructions can be evaluated
      Evaluate selected instructions in parallel
      Replace the selected instructions with the corresponding literal
      bf603a76
    • Paul Fultz II's avatar
      Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
      Paul Fultz II authored
      * Upgrade to cppcheck 2.8
      a401e72a
  6. 25 May, 2022 2 commits
  7. 24 May, 2022 4 commits
  8. 20 May, 2022 4 commits
  9. 19 May, 2022 2 commits
  10. 17 May, 2022 3 commits
  11. 13 May, 2022 1 commit
    • Chris Austen's avatar
      Update install_prereqs.sh for individual use (#1197) · 8c94ad07
      Chris Austen authored
      Our documentation indicates a user with sudo can run the install_prereqs.sh file. Turns out that the file is not complete enough to run on Ubuntu 18.04/20.04 independently. I updated the file to resolve the failures.
      
      resolves #1191
      8c94ad07
  12. 11 May, 2022 2 commits
  13. 10 May, 2022 1 commit
  14. 09 May, 2022 1 commit
  15. 06 May, 2022 2 commits
  16. 05 May, 2022 1 commit
    • Paul Fultz II's avatar
      Cppcheck fixes (#1195) · d582425b
      Paul Fultz II authored
      Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.
      d582425b
  17. 03 May, 2022 1 commit
  18. 02 May, 2022 1 commit
  19. 29 Apr, 2022 1 commit
  20. 27 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Add lane reduction (#1180) · 4c72cc95
      Paul Fultz II authored
      With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum:
      
      # lane
      gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms
      # block
      gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms
      # original
      gpu::reduce_sum[axes={1}]: 6.73456ms
      There is some basic logic to pick between lane and block reduce automatically.
      4c72cc95
  21. 26 Apr, 2022 1 commit
  22. 23 Apr, 2022 1 commit
    • Charlie Lin's avatar
      ReverseSequence op (#1177) · 31906785
      Charlie Lin authored
      Implements the ReverseSequence ONNX operator as a parser.
      
      This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell.
      We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator.
      The ONNX backend tests are disabled because this does not handle variable sequence_lens.
      31906785
  23. 19 Apr, 2022 1 commit
    • Charlie Lin's avatar
      Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152) · 764273e4
      Charlie Lin authored
      Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp.
      Removed cpu_pooling, instead using reference pooling in pooling.hpp
      Added reference implementation of Lp Norm pooling and the global version
      Added tests for the Lp Norm Pooling
      764273e4