1. 18 Nov, 2021 1 commit
  2. 17 Nov, 2021 1 commit
    • Paul Fultz II's avatar
      Handle removing contiguous on operators that use modules (#1005) · 785307c3
      Paul Fultz II authored
      Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.
      
      - Update to pass the module inputs correctly to compute_shape
      - Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
      - Add tests with contiguous and pointwise module function.
      - Move add_pointwise function to a seperate header to reuse across different tests
      785307c3
  3. 15 Nov, 2021 1 commit
  4. 11 Nov, 2021 1 commit
    • Paul Fultz II's avatar
      Conditionally enable pointwise fusion (#992) · 157935ff
      Paul Fultz II authored
      This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.
      
      This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.
      157935ff
  5. 09 Nov, 2021 1 commit
  6. 05 Nov, 2021 1 commit
  7. 03 Nov, 2021 1 commit
    • Umang Yadav's avatar
      Add tests for the DepthToSpace+Binary pointwise operations fusion (#987) · eb6abd27
      Umang Yadav authored
      In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.
      
      If there is trailing binary pointwise operator after depthToSpace then, migraphx can move binary operator before contiguous and reshape of the depthtospce.
      
      So, it becomes reshape-->transpose-->binary_op-->contiguous-->reshape.
      
      Explicit contiguous wouldn't be required since binary_op outputs standard shape. So, it becomes reshape-->transpose-->binary-->reshape.
      
      simplify_reshapes already has matcher that can do this transformation. This PR adds test for cases like depthtospace +binary op.
      
      solves #905
      eb6abd27
  8. 28 Oct, 2021 3 commits
    • Shucai Xiao's avatar
      NonMaxSuppression op ref implementation (#968) · c98b22d8
      Shucai Xiao authored
      This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.
      c98b22d8
    • Umang Yadav's avatar
      DepthToSpace and pointwise unary operations fusion (#986) · cf0b6d6d
      Umang Yadav authored
      In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.
      
      This PR adds matcher to find d2s + unary pointwise ops.
      
      Application of the matcher moves the pointwise unary operation before the contiguous and reshape of the d2s.
      So it becomes
      reshape --> transpose --> unary --> contiguous --> reshape.
      
      Motivation is that, later pointwise module would be created out of unary --> contiguous --> reshape. Codegen for this pointwise module can write out buffer such that explicit contiguous and reshape wouldn't be required.
      
      This transformation is not always guaranteed to improve performance, since unary op will operate on non-standard shape. So, we would need some tuning mechanism to make decision.
      
      #905 pending PR for binary operations.
      cf0b6d6d
    • Shucai Xiao's avatar
      Roialign gpu impl (#972) · 912c8d22
      Shucai Xiao authored
      GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
      912c8d22
  9. 20 Oct, 2021 1 commit
    • Shucai Xiao's avatar
      Roialign (#952) · d7653732
      Shucai Xiao authored
      Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.
      d7653732
  10. 19 Oct, 2021 2 commits
  11. 18 Oct, 2021 2 commits
  12. 15 Oct, 2021 1 commit
    • Cagri's avatar
      Enabling rocTX markers for migraphx-driver via roctx knob (#946) · 4a71ec8c
      Cagri authored
      
      
      Added features:
      This enables wrapping each migraphx operator with rocTX markers.
      It adds new knob trace to migraphx-driver binary.
      
      Limitation:
      
      rocTX standalone does not output a file, it needs to be used with rocprof. Example command line:
      
      /opt/rocm/bin/rocprof -i ./in.txt --hip-trace --roctx-trace --flush-rate 10ms --timestamp on -d cagri_out --obj-tracking on /opt/rocm/bin/migraphx-driver trace ./resnet50-v2-7.onnx --onnx --gpu
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      4a71ec8c
  13. 14 Oct, 2021 1 commit
  14. 13 Oct, 2021 2 commits
  15. 08 Oct, 2021 2 commits
  16. 01 Oct, 2021 2 commits
    • turneram's avatar
      Add multinomial op (#954) · 0b7672d7
      turneram authored
      
      
      Add multinomial op to onnx parser with ref and GPU implementations.
      
      The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution.
      
      Resolves #821
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      0b7672d7
    • turneram's avatar
      Add remaining random ops for Barracuda models (#963) · ccd08b4c
      turneram authored
      Add RandomNormal, RandomNormalLike, RandomUniform, and RandomUniformLike to onnx parser and onnx tests
      
      Each pair of Random*/Random*Like is implemented using a single op_parser because the ops share the same essential attributes and algorithm with the difference that Random*Like get the output type and/or shape from an input argument and Random* take both from attributes.
      
      Resolves #907
      Resolves #959
      ccd08b4c
  17. 29 Sep, 2021 1 commit
  18. 27 Sep, 2021 1 commit
  19. 23 Sep, 2021 1 commit
  20. 21 Sep, 2021 1 commit
  21. 17 Sep, 2021 3 commits
  22. 16 Sep, 2021 1 commit
    • Shucai Xiao's avatar
      Loop operator (#853) · a275f590
      Shucai Xiao authored
      
      
      Add Loop operator for opset version 13.
      Notes: 1) Default max iteration number is 10 if no max iteration number is provided
      2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model.
      3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later.
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      a275f590
  23. 10 Sep, 2021 2 commits
  24. 07 Sep, 2021 3 commits
  25. 02 Sep, 2021 2 commits
  26. 01 Sep, 2021 2 commits
    • Paul Fultz II's avatar
      Add a command to the driver to list supported onnx operators (#938) · 1f741f73
      Paul Fultz II authored
      * Add a command to list supported onnx operators
      1f741f73
    • Chris Austen's avatar
      Adjust HIP_COMPILER_FLAGS to support <$:$<>:> and SHELL: tags (#933) · 33a17257
      Chris Austen authored
      
      
      In ROCm 4.5.0 hip compile flags are coming in differently.  This has
      caused some parsing issues for the HIP_COMPILER_FLAGS variable.  As an example
      
          ROCm 4.3.0: --offload-arch=gfx900
          ROCm 4.5.0: <$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900>
      
      Using existing code...
          $<$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900>
      Becomes...
          $<$<COMPILE_LANGUAGE:CXX>:SHELL:
      
      There are two problems with that.
        1) The "<" is not balanced with a "> due to the regex consuming the ">"
        2) There is still a `SHELL:`  label.
      
      This commit repairs both.  I took the regex parsing code from ROCmSoftwarePlatform/MIOpen/blame/develop/CMakeLists.txt
      but improved it to support handling of target features like
      <$<COMPILE_LANGUAGE:CXX>:SHELL:--offload-arch=gfx900:xxx+>
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      33a17257