1. 31 Jan, 2022 3 commits
  2. 28 Jan, 2022 2 commits
  3. 27 Jan, 2022 1 commit
  4. 26 Jan, 2022 1 commit
    • turneram's avatar
      Add HardSwish op ONNX parser (#1066) · 7477aeb8
      turneram authored
      Add HardSwish to HardSigmoid parser
      
      HardSwish formula is y = x * HardSigmoid<alpha=1/6, beta=0.5>(x)
      HardSigmoid parser sets alpha to 1/6 and adds the mul instruction if op name is HardSwish
      
      Resolves #1062
      7477aeb8
  5. 21 Jan, 2022 4 commits
  6. 17 Jan, 2022 1 commit
  7. 11 Jan, 2022 1 commit
    • turneram's avatar
      HardSigmoid ONNX parser (#1040) · fc42d852
      turneram authored
      Add HardSigmoid onnx parser and unit tests
      Produces mathematical equivalent to ONNX operator through combination of existing pointwise ops.
      Resolves #1028
      fc42d852
  8. 10 Jan, 2022 1 commit
  9. 05 Jan, 2022 1 commit
  10. 09 Dec, 2021 2 commits
    • Shucai Xiao's avatar
      Softmax perf optimization (#1014) · 2e337c7f
      Shucai Xiao authored
      Changed the number of threads in a block from 256 to 128
      Increased the max number of blocks in the kernel from 256 to 1M.
      For the case that the axis is the last dimension, we removed the computation of index since it is not required.
      
      With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
      2e337c7f
    • Paul Fultz II's avatar
      Fuse last instruction in fuse_pointwise (#1015) · e758d457
      Paul Fultz II authored
      Fuse last instruction in fuse_pointwise
      This is also fixes a bug with using an invalid iterator.
      e758d457
  11. 08 Dec, 2021 1 commit
  12. 07 Dec, 2021 1 commit
  13. 02 Dec, 2021 1 commit
  14. 30 Nov, 2021 2 commits
  15. 25 Nov, 2021 1 commit
    • Shucai Xiao's avatar
      Non std shape auto contiguous (#1001) · 2d4dcc47
      Shucai Xiao authored
      Resolves a problem in parsing the ssd-10 model.
      
      The problem is, after inserting contiguous in the auto_contiguous pass, standard output shape of some operators becomes non-standard. Then, if the next operator requires standard input shape, an exception is throw.
      
      For example, if we pass the following model:
      Input (standard shape) -> transpose (transposed) -> softmax (transposed) -> transpose (standard) -> gather.
      It works fine, and no contiguous is required.
      
      In the auto_contiguous pass, a contiguous is inserted after the first transpose. Then we need to replace the first transpose with the contiguous and recompute all shapes. When it comes to the gather operator, its input is a transposed shape, and an exception is thrown.
      
      The solution is in the recompute_shape() function. If it is called by the auto_contiguous pass and shape of an instruction is changed, and the shape is non_standard, we do not recompute shape of its output. The reason is: since its output shape is non_standard, a contiguous op will be added after the instruction, which will recompute shape for later operators.
      2d4dcc47
  16. 24 Nov, 2021 1 commit
  17. 22 Nov, 2021 1 commit
  18. 18 Nov, 2021 1 commit
  19. 17 Nov, 2021 1 commit
    • Paul Fultz II's avatar
      Handle removing contiguous on operators that use modules (#1005) · 785307c3
      Paul Fultz II authored
      Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.
      
      - Update to pass the module inputs correctly to compute_shape
      - Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
      - Add tests with contiguous and pointwise module function.
      - Move add_pointwise function to a seperate header to reuse across different tests
      785307c3
  20. 15 Nov, 2021 1 commit
  21. 11 Nov, 2021 1 commit
    • Paul Fultz II's avatar
      Conditionally enable pointwise fusion (#992) · 157935ff
      Paul Fultz II authored
      This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.
      
      This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.
      157935ff
  22. 09 Nov, 2021 1 commit
  23. 05 Nov, 2021 1 commit
  24. 03 Nov, 2021 1 commit
    • Umang Yadav's avatar
      Add tests for the DepthToSpace+Binary pointwise operations fusion (#987) · eb6abd27
      Umang Yadav authored
      In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.
      
      If there is trailing binary pointwise operator after depthToSpace then, migraphx can move binary operator before contiguous and reshape of the depthtospce.
      
      So, it becomes reshape-->transpose-->binary_op-->contiguous-->reshape.
      
      Explicit contiguous wouldn't be required since binary_op outputs standard shape. So, it becomes reshape-->transpose-->binary-->reshape.
      
      simplify_reshapes already has matcher that can do this transformation. This PR adds test for cases like depthtospace +binary op.
      
      solves #905
      eb6abd27
  25. 28 Oct, 2021 3 commits
    • Shucai Xiao's avatar
      NonMaxSuppression op ref implementation (#968) · c98b22d8
      Shucai Xiao authored
      This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.
      c98b22d8
    • Umang Yadav's avatar
      DepthToSpace and pointwise unary operations fusion (#986) · cf0b6d6d
      Umang Yadav authored
      In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.
      
      This PR adds matcher to find d2s + unary pointwise ops.
      
      Application of the matcher moves the pointwise unary operation before the contiguous and reshape of the d2s.
      So it becomes
      reshape --> transpose --> unary --> contiguous --> reshape.
      
      Motivation is that, later pointwise module would be created out of unary --> contiguous --> reshape. Codegen for this pointwise module can write out buffer such that explicit contiguous and reshape wouldn't be required.
      
      This transformation is not always guaranteed to improve performance, since unary op will operate on non-standard shape. So, we would need some tuning mechanism to make decision.
      
      #905 pending PR for binary operations.
      cf0b6d6d
    • Shucai Xiao's avatar
      Roialign gpu impl (#972) · 912c8d22
      Shucai Xiao authored
      GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
      912c8d22
  26. 20 Oct, 2021 1 commit
    • Shucai Xiao's avatar
      Roialign (#952) · d7653732
      Shucai Xiao authored
      Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.
      d7653732
  27. 19 Oct, 2021 2 commits
  28. 18 Oct, 2021 2 commits