1. 16 Feb, 2022 1 commit
  2. 11 Feb, 2022 2 commits
  3. 09 Feb, 2022 2 commits
  4. 08 Feb, 2022 3 commits
  5. 02 Feb, 2022 1 commit
    • Paul Fultz II's avatar
      Update trace_eval to preview the output buffers (#1073) · b20e3d4d
      Paul Fultz II authored
      Currently, MIGRAPHX_TRACE_EVAL=2 prints out the entire output buffer, but this can produce a lot of output. To make it easier to inspect and debug, using MIGRAPHX_TRACE_EVAL=2 now only prints 10 elements from the buffer(the first 5 and last 5) and shows any fp classifications found in the buffer(ie nans, infinity, etc). The previous behavior can still be enabled with MIGRAPHX_TRACE_EVAL=3.
      b20e3d4d
  6. 01 Feb, 2022 1 commit
  7. 31 Jan, 2022 1 commit
  8. 28 Jan, 2022 3 commits
  9. 27 Jan, 2022 1 commit
  10. 26 Jan, 2022 1 commit
    • turneram's avatar
      Add HardSwish op ONNX parser (#1066) · 7477aeb8
      turneram authored
      Add HardSwish to HardSigmoid parser
      
      HardSwish formula is y = x * HardSigmoid<alpha=1/6, beta=0.5>(x)
      HardSigmoid parser sets alpha to 1/6 and adds the mul instruction if op name is HardSwish
      
      Resolves #1062
      7477aeb8
  11. 21 Jan, 2022 4 commits
  12. 20 Jan, 2022 2 commits
  13. 17 Jan, 2022 1 commit
  14. 11 Jan, 2022 1 commit
    • turneram's avatar
      HardSigmoid ONNX parser (#1040) · fc42d852
      turneram authored
      Add HardSigmoid onnx parser and unit tests
      Produces mathematical equivalent to ONNX operator through combination of existing pointwise ops.
      Resolves #1028
      fc42d852
  15. 10 Jan, 2022 1 commit
  16. 05 Jan, 2022 1 commit
  17. 10 Dec, 2021 1 commit
  18. 09 Dec, 2021 2 commits
    • Shucai Xiao's avatar
      Softmax perf optimization (#1014) · 2e337c7f
      Shucai Xiao authored
      Changed the number of threads in a block from 256 to 128
      Increased the max number of blocks in the kernel from 256 to 1M.
      For the case that the axis is the last dimension, we removed the computation of index since it is not required.
      
      With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
      2e337c7f
    • Paul Fultz II's avatar
      Fuse last instruction in fuse_pointwise (#1015) · e758d457
      Paul Fultz II authored
      Fuse last instruction in fuse_pointwise
      This is also fixes a bug with using an invalid iterator.
      e758d457
  19. 08 Dec, 2021 1 commit
  20. 07 Dec, 2021 2 commits
    • Paul Fultz II's avatar
      Rename reduce_inputs to virtual_inputs (#1021) · 1793cc54
      Paul Fultz II authored
      simple variable rename
      1793cc54
    • Shucai Xiao's avatar
      Test runner match input output using tensor names (#996) · 0f9b4072
      Shucai Xiao authored
      1. Previous implementation assumes inputs and outputs .pb files are ordered, but it is not the case. So, we should use the name of the tensors in the input/output .pb files to match the input and output in the onnx model. (This change applies to the BERT_Squad model)
      2. When parsing a model with dynamic input shape, current implementation uses the default batch_size for the unknown dims, which can cause parsing error for some cases (e.g. mask_rcnn model). The solution is we first read an input to get the shape, then use these shapes to parse the onnx model.
      0f9b4072
  21. 05 Dec, 2021 1 commit
  22. 02 Dec, 2021 1 commit
  23. 30 Nov, 2021 2 commits
  24. 25 Nov, 2021 2 commits
  25. 24 Nov, 2021 1 commit
  26. 22 Nov, 2021 1 commit
    • Cagri's avatar
      Helper script for rocTX run and parse (#985) · 4f9a0ce7
      Cagri authored
      This provides a helper script to run rocTX markers with migraphx-driver and reduces the number of steps a user would go through running rocTX knob.
      Run:
      python roctx.py --run --onnx_file <ONNX_FILE> --migraphx_args "--onnx --gpu --fp16 --batch 16" --out outputfolder
      Runs and parses the run output (JSON file). An example output is given below:
      
                                                           SUM  MIN  MAX
      Marker start: gpu::convolution                      5272   10  563
      Marker start: gpu::add_relu                          605   12   18
      Marker start: gpu::gather                            299  145  154
      Marker start: gpu::mul_add                           227   14   57
      Marker start: gpu::sub                               177   13   42
      Marker start: gpu::concat                            169   22   31
      Marker start: gpu::triadd_relu                       163   15   18
      Marker start: load                                   141    0    3
      Marker start: hip::hip_copy_literal                  111    0    3
      Marker start: gpu::add                                58   13   17
      Marker start: broadcast                               52    0    3
      Marker start: gpu::convert                            31   15   16
      Marker start: slice                                   11    0    1
      Marker start: gpu::pooling                             9    9    9
      Marker start: step                                     2    2    2
      Marker start: @param                                   2    0    1
      Marker start: reshape                                  1    0    1
      Marker start: hip::hip_allocate_memory                 1    1    1
      Marker start: check_context::migraphx::version_...     0  ERR  ERR
      
      TOTAL TIME: 7331 us
      
      JSON FILE PATH: [...]/rpl_data_211019_195229_9369/input_results_211019_195229/trace.json
      Parse:
      python roctx.py --parse --json_path <JSON PATH FROM RUN>
      Note: The parse knob is made available if the user wants to parse an already existing JSON output.
      4f9a0ce7