1. 05 Nov, 2021 1 commit
  2. 03 Nov, 2021 1 commit
    • Umang Yadav's avatar
      Add tests for the DepthToSpace+Binary pointwise operations fusion (#987) · eb6abd27
      Umang Yadav authored
      In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.
      
      If there is trailing binary pointwise operator after depthToSpace then, migraphx can move binary operator before contiguous and reshape of the depthtospce.
      
      So, it becomes reshape-->transpose-->binary_op-->contiguous-->reshape.
      
      Explicit contiguous wouldn't be required since binary_op outputs standard shape. So, it becomes reshape-->transpose-->binary-->reshape.
      
      simplify_reshapes already has matcher that can do this transformation. This PR adds test for cases like depthtospace +binary op.
      
      solves #905
      eb6abd27
  3. 28 Oct, 2021 3 commits
    • Shucai Xiao's avatar
      NonMaxSuppression op ref implementation (#968) · c98b22d8
      Shucai Xiao authored
      This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.
      c98b22d8
    • Umang Yadav's avatar
      DepthToSpace and pointwise unary operations fusion (#986) · cf0b6d6d
      Umang Yadav authored
      In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.
      
      This PR adds matcher to find d2s + unary pointwise ops.
      
      Application of the matcher moves the pointwise unary operation before the contiguous and reshape of the d2s.
      So it becomes
      reshape --> transpose --> unary --> contiguous --> reshape.
      
      Motivation is that, later pointwise module would be created out of unary --> contiguous --> reshape. Codegen for this pointwise module can write out buffer such that explicit contiguous and reshape wouldn't be required.
      
      This transformation is not always guaranteed to improve performance, since unary op will operate on non-standard shape. So, we would need some tuning mechanism to make decision.
      
      #905 pending PR for binary operations.
      cf0b6d6d
    • Shucai Xiao's avatar
      Roialign gpu impl (#972) · 912c8d22
      Shucai Xiao authored
      GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
      912c8d22
  4. 20 Oct, 2021 1 commit
    • Shucai Xiao's avatar
      Roialign (#952) · d7653732
      Shucai Xiao authored
      Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.
      d7653732
  5. 19 Oct, 2021 1 commit
  6. 18 Oct, 2021 2 commits
  7. 15 Oct, 2021 1 commit
    • Cagri's avatar
      Enabling rocTX markers for migraphx-driver via roctx knob (#946) · 4a71ec8c
      Cagri authored
      
      
      Added features:
      This enables wrapping each migraphx operator with rocTX markers.
      It adds new knob trace to migraphx-driver binary.
      
      Limitation:
      
      rocTX standalone does not output a file, it needs to be used with rocprof. Example command line:
      
      /opt/rocm/bin/rocprof -i ./in.txt --hip-trace --roctx-trace --flush-rate 10ms --timestamp on -d cagri_out --obj-tracking on /opt/rocm/bin/migraphx-driver trace ./resnet50-v2-7.onnx --onnx --gpu
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      4a71ec8c
  8. 14 Oct, 2021 1 commit
  9. 08 Oct, 2021 2 commits
  10. 01 Oct, 2021 2 commits
    • turneram's avatar
      Add multinomial op (#954) · 0b7672d7
      turneram authored
      
      
      Add multinomial op to onnx parser with ref and GPU implementations.
      
      The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution.
      
      Resolves #821
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      0b7672d7
    • turneram's avatar
      Add remaining random ops for Barracuda models (#963) · ccd08b4c
      turneram authored
      Add RandomNormal, RandomNormalLike, RandomUniform, and RandomUniformLike to onnx parser and onnx tests
      
      Each pair of Random*/Random*Like is implemented using a single op_parser because the ops share the same essential attributes and algorithm with the difference that Random*Like get the output type and/or shape from an input argument and Random* take both from attributes.
      
      Resolves #907
      Resolves #959
      ccd08b4c
  11. 29 Sep, 2021 1 commit
  12. 27 Sep, 2021 1 commit
  13. 23 Sep, 2021 1 commit
  14. 21 Sep, 2021 1 commit
  15. 17 Sep, 2021 3 commits
  16. 16 Sep, 2021 1 commit
    • Shucai Xiao's avatar
      Loop operator (#853) · a275f590
      Shucai Xiao authored
      
      
      Add Loop operator for opset version 13.
      Notes: 1) Default max iteration number is 10 if no max iteration number is provided
      2) To change the max iter number, a user can set the max_loop_iterations in the onnx_option struct when parsing a model.
      3) The returned shape of the scan output is from the max_loop_iterations even the actual loop num is less than that. This issue also applies to other operators like NonZero and NonMaxSuppression. A issue #948 is created to track this and to be resolved later.
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      a275f590
  17. 10 Sep, 2021 1 commit
  18. 07 Sep, 2021 1 commit
    • Shucai Xiao's avatar
      qdq for quantization and include subgraph (#891) · b45f7239
      Shucai Xiao authored
      
      
      Add operators, refactor parsers, add rewrite passes, add tests
      Add ref implementations
      Move broadcasting of scales and zero points to onnx parser
      Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type
      fp16 and fp8 quantization to include subgraph and parameters
      fix unit test to use qdq operators for int8 quantization
      Co-authored-by: default avatarturneram <alturner@amd.com>
      b45f7239
  19. 02 Sep, 2021 2 commits
  20. 31 Aug, 2021 2 commits
  21. 25 Aug, 2021 1 commit
    • Shucai Xiao's avatar
      Exclude param from deadcode elimiation (#910) · 4b86a0aa
      Shucai Xiao authored
      
      
      * always keep parameters
      
      * clang format
      
      * fix tidy error
      
      * clang format
      
      * add more unit tests to have more code coverage
      
      * fixed a bug to ensure get_parameter_names to return ordered parameter names
      
      * clang format
      
      * remove unnecessary print out
      
      * refine a code change
      
      * clang format
      
      * add a unit test to check parameter is not removed by dead code elimination
      
      * clang format
      
      * rename a function name
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      4b86a0aa
  22. 24 Aug, 2021 1 commit
    • Umang Yadav's avatar
      Change attributes names to be more consistent and reflect better meaning (#916) · 0d2606bb
      Umang Yadav authored
      * rename broadcast and multibroadcast output_lens attribute to out_lens attribute, and change tests and source code to reflect the same
      
      * change the reshape attribute from dims to out_lens
      
      * change transpose attribute's name from dims to perm to reflect better meaning
      
      * use permutation instead of perm for transpose
      
      clang formaating
      
      * use dims instead of out_lens for reshape
      
      clang formatting
      0d2606bb
  23. 23 Aug, 2021 1 commit
  24. 20 Aug, 2021 1 commit
  25. 19 Aug, 2021 1 commit
  26. 18 Aug, 2021 2 commits
    • turneram's avatar
      Optimize Q/DQ Format Pass (#889) · 0b5f33b6
      turneram authored
      * Add operators, refactor parsers, add rewrite passes, add tests
      
      * Add ref implementations
      
      * Move broadcasting of scales and zero points to onnx parser
      
      * Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type
      
      * Switch certain variables to int64_t
      
      * Fix overflow in implicit constant conversion
      
      * Remove operators.hpp from includes in tf_test.cpp
      
      * Add conversion for int32 input to quantizelinear and add test case; remove operators.hpp from onnx_test.cpp includes
      
      * Switch dequantizelinear math from int32 to float
      
      * Remove changes to operators.hpp
      
      * Simplify apply_quantizelinear
      
      * Add verify test for int32 data
      
      * Add rewrite_quantization back to CMakeLists
      
      * Add passes to insert qdq after add_bias is applied, replace quant_ops, and remove remaining qdq pairs
      
      * Renaming, refactoring, cleaning up code, adding formal test, and adding passes to targets
      
      * Renaming, review comments, begin adding more specific tests
      
      * Add more specific unit tests
      
      * Fix failing test on CI
      
      * Correct matcher and update qop rewriting, update tests and add more tests
      
      * Update matcher, clean up simplify_qdq, tweak tests
      
      * Add tests, remove pass from CPU target, update dot parameters, clean up simplify_qdq
      
      * Fix correctness bug in ref q/dq implementations; edit gemm parser to make beta always 0.0
      
      * Remove unused variables in onnx gemm tests
      0b5f33b6
    • turneram's avatar
  27. 10 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add option to compile with hiprtc (#892) · 91c9ebbc
      Paul Fultz II authored
      * Add hiprtc compile option
      * Add cross compile test
      * Update error reporting
      * Add tests for errors and warnings
      * Fix tidy warning
      * Add comment to ifdefs
      * Skip null character at end of log
      * Assert there is null at the end
      91c9ebbc
  28. 09 Aug, 2021 1 commit
  29. 05 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add gpu driver and improvements to pointwise codegen (#851) · 29fa2666
      Paul Fultz II authored
      
      
      * Add method to compile pointwise
      
      * Formatting
      
      * Add lambda
      
      * Add semicolon
      
      * Rename variable
      
      * Add driver to run jit kernels
      
      * Formatting
      
      * Add context
      
      * Formatting
      
      * Make seperate driver folder
      
      * Add more general gpu driver
      
      * Formatting
      
      * Print out wll time
      
      * Formatting
      
      * Run multiple times and skip first run
      
      * Formatting
      
      * Seperate time_op
      
      * Run an op for comparison
      
      * Formatting
      
      * Add debug asserts
      
      * Formatting
      
      * Change parameer name
      
      * Formatting
      
      * Fix argument order
      
      * Formatting
      
      * Add preloading
      
      * Formatting
      
      * Allow a different data type
      
      * Formatting
      
      * Pipeline transformations
      
      * Formatting
      
      * Add vectorization
      
      * Formatting
      
      * Reduce dims
      
      * Formatting
      
      * Compile with launch params as constant
      
      * Formatting
      
      * Make sure buffer can be vecotrized
      
      * Formatting
      
      * Enable vectorization and preloading
      
      * Formatting
      
      * Add print header
      
      * Formatting
      
      * Avoid allocating to large of LDS
      
      * Formatting
      
      * Add some vec functions to a seperate header
      
      * Formatting
      
      * Add stride loops
      
      * Formatting
      
      * Improve the transform pipeline
      
      * Formatting
      
      * Add const
      
      * Fix shape check
      
      * Formatting
      
      * Just check stride axis is zero
      
      * Remove extra finc_vector_axis overload
      
      * Simplify some mroe functions
      
      * Formatting
      
      * Remove some more extra functions
      
      * Formatting
      
      * Simplify more decltypes
      
      * Add another const
      
      * Fix test
      
      * Get buffer pointer different for older compilers
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      29fa2666
  30. 04 Aug, 2021 1 commit