1. 09 Dec, 2021 2 commits
    • Shucai Xiao's avatar
      Softmax perf optimization (#1014) · 2e337c7f
      Shucai Xiao authored
      Changed the number of threads in a block from 256 to 128
      Increased the max number of blocks in the kernel from 256 to 1M.
      For the case that the axis is the last dimension, we removed the computation of index since it is not required.
      
      With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
      2e337c7f
    • Paul Fultz II's avatar
      Fuse last instruction in fuse_pointwise (#1015) · e758d457
      Paul Fultz II authored
      Fuse last instruction in fuse_pointwise
      This is also fixes a bug with using an invalid iterator.
      e758d457
  2. 08 Dec, 2021 1 commit
  3. 07 Dec, 2021 2 commits
    • Paul Fultz II's avatar
      Rename reduce_inputs to virtual_inputs (#1021) · 1793cc54
      Paul Fultz II authored
      simple variable rename
      1793cc54
    • Shucai Xiao's avatar
      Test runner match input output using tensor names (#996) · 0f9b4072
      Shucai Xiao authored
      1. Previous implementation assumes inputs and outputs .pb files are ordered, but it is not the case. So, we should use the name of the tensors in the input/output .pb files to match the input and output in the onnx model. (This change applies to the BERT_Squad model)
      2. When parsing a model with dynamic input shape, current implementation uses the default batch_size for the unknown dims, which can cause parsing error for some cases (e.g. mask_rcnn model). The solution is we first read an input to get the shape, then use these shapes to parse the onnx model.
      0f9b4072
  4. 05 Dec, 2021 1 commit
  5. 02 Dec, 2021 1 commit
  6. 30 Nov, 2021 2 commits
  7. 25 Nov, 2021 2 commits
  8. 24 Nov, 2021 1 commit
  9. 22 Nov, 2021 3 commits
    • Cagri's avatar
      Helper script for rocTX run and parse (#985) · 4f9a0ce7
      Cagri authored
      This provides a helper script to run rocTX markers with migraphx-driver and reduces the number of steps a user would go through running rocTX knob.
      Run:
      python roctx.py --run --onnx_file <ONNX_FILE> --migraphx_args "--onnx --gpu --fp16 --batch 16" --out outputfolder
      Runs and parses the run output (JSON file). An example output is given below:
      
                                                           SUM  MIN  MAX
      Marker start: gpu::convolution                      5272   10  563
      Marker start: gpu::add_relu                          605   12   18
      Marker start: gpu::gather                            299  145  154
      Marker start: gpu::mul_add                           227   14   57
      Marker start: gpu::sub                               177   13   42
      Marker start: gpu::concat                            169   22   31
      Marker start: gpu::triadd_relu                       163   15   18
      Marker start: load                                   141    0    3
      Marker start: hip::hip_copy_literal                  111    0    3
      Marker start: gpu::add                                58   13   17
      Marker start: broadcast                               52    0    3
      Marker start: gpu::convert                            31   15   16
      Marker start: slice                                   11    0    1
      Marker start: gpu::pooling                             9    9    9
      Marker start: step                                     2    2    2
      Marker start: @param                                   2    0    1
      Marker start: reshape                                  1    0    1
      Marker start: hip::hip_allocate_memory                 1    1    1
      Marker start: check_context::migraphx::version_...     0  ERR  ERR
      
      TOTAL TIME: 7331 us
      
      JSON FILE PATH: [...]/rpl_data_211019_195229_9369/input_results_211019_195229/trace.json
      Parse:
      python roctx.py --parse --json_path <JSON PATH FROM RUN>
      Note: The parse knob is made available if the user wants to parse an already existing JSON output.
      4f9a0ce7
    • kahmed10's avatar
      Add fp16 verify to driver (#988) · 3c1e91dc
      kahmed10 authored
      Allows --fp16 to be used in the driver to compare the target fp16 result and the ref fp32 result.
      3c1e91dc
    • Paul's avatar
      Fix target flag · 636bce89
      Paul authored
      636bce89
  10. 18 Nov, 2021 1 commit
  11. 17 Nov, 2021 1 commit
    • Paul Fultz II's avatar
      Handle removing contiguous on operators that use modules (#1005) · 785307c3
      Paul Fultz II authored
      Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.
      
      - Update to pass the module inputs correctly to compute_shape
      - Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
      - Add tests with contiguous and pointwise module function.
      - Move add_pointwise function to a seperate header to reuse across different tests
      785307c3
  12. 15 Nov, 2021 1 commit
  13. 11 Nov, 2021 1 commit
    • Paul Fultz II's avatar
      Conditionally enable pointwise fusion (#992) · 157935ff
      Paul Fultz II authored
      This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.
      
      This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.
      157935ff
  14. 10 Nov, 2021 1 commit
    • Shucai Xiao's avatar
      Turn on gemm unit tests (#997) · 38287064
      Shucai Xiao authored
      
      
      This PR is to turn on a few gemm unit test with int8 input datatype. Before rocm4.4, int8 input data type requires matrix size to be no less than 4 in rocblas implementation. Because of this limitation, we turned off a few gemm unit tests with int8 input data type.
      
      This limitation is removed in rocm4.4, so after we upgrade to rocm4.5, we can turn on these unit tests. Also we change to unit test conv_bn_add to adding instructions to module instead of program.
      Co-authored-by: default avatarkahmed10 <15948690+kahmed10@users.noreply.github.com>
      38287064
  15. 09 Nov, 2021 1 commit
  16. 08 Nov, 2021 1 commit
  17. 05 Nov, 2021 1 commit
  18. 03 Nov, 2021 1 commit
    • Umang Yadav's avatar
      Add tests for the DepthToSpace+Binary pointwise operations fusion (#987) · eb6abd27
      Umang Yadav authored
      In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.
      
      If there is trailing binary pointwise operator after depthToSpace then, migraphx can move binary operator before contiguous and reshape of the depthtospce.
      
      So, it becomes reshape-->transpose-->binary_op-->contiguous-->reshape.
      
      Explicit contiguous wouldn't be required since binary_op outputs standard shape. So, it becomes reshape-->transpose-->binary-->reshape.
      
      simplify_reshapes already has matcher that can do this transformation. This PR adds test for cases like depthtospace +binary op.
      
      solves #905
      eb6abd27
  19. 28 Oct, 2021 4 commits
    • Shucai Xiao's avatar
      NonMaxSuppression op ref implementation (#968) · c98b22d8
      Shucai Xiao authored
      This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.
      c98b22d8
    • Umang Yadav's avatar
      DepthToSpace and pointwise unary operations fusion (#986) · cf0b6d6d
      Umang Yadav authored
      In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.
      
      This PR adds matcher to find d2s + unary pointwise ops.
      
      Application of the matcher moves the pointwise unary operation before the contiguous and reshape of the d2s.
      So it becomes
      reshape --> transpose --> unary --> contiguous --> reshape.
      
      Motivation is that, later pointwise module would be created out of unary --> contiguous --> reshape. Codegen for this pointwise module can write out buffer such that explicit contiguous and reshape wouldn't be required.
      
      This transformation is not always guaranteed to improve performance, since unary op will operate on non-standard shape. So, we would need some tuning mechanism to make decision.
      
      #905 pending PR for binary operations.
      cf0b6d6d
    • Shucai Xiao's avatar
      Roialign gpu impl (#972) · 912c8d22
      Shucai Xiao authored
      GPU implementation of the roialign operator, using the jit approach to reduce the lib size.
      912c8d22
    • kahmed10's avatar
      Change to read the docs theme (#990) · 6df1e02b
      kahmed10 authored
      Updates the theme of our documentation so that it matches the rest of the ROCm libraries.
      6df1e02b
  20. 20 Oct, 2021 1 commit
    • Shucai Xiao's avatar
      Roialign (#952) · d7653732
      Shucai Xiao authored
      Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.
      d7653732
  21. 19 Oct, 2021 2 commits
  22. 18 Oct, 2021 2 commits
  23. 15 Oct, 2021 1 commit
    • Cagri's avatar
      Enabling rocTX markers for migraphx-driver via roctx knob (#946) · 4a71ec8c
      Cagri authored
      
      
      Added features:
      This enables wrapping each migraphx operator with rocTX markers.
      It adds new knob trace to migraphx-driver binary.
      
      Limitation:
      
      rocTX standalone does not output a file, it needs to be used with rocprof. Example command line:
      
      /opt/rocm/bin/rocprof -i ./in.txt --hip-trace --roctx-trace --flush-rate 10ms --timestamp on -d cagri_out --obj-tracking on /opt/rocm/bin/migraphx-driver trace ./resnet50-v2-7.onnx --onnx --gpu
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      4a71ec8c
  24. 14 Oct, 2021 1 commit
  25. 13 Oct, 2021 3 commits
  26. 09 Oct, 2021 1 commit
  27. 08 Oct, 2021 1 commit