1. 16 Sep, 2022 2 commits
  2. 15 Sep, 2022 1 commit
  3. 14 Sep, 2022 3 commits
  4. 13 Sep, 2022 1 commit
    • turneram's avatar
      Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) · a10a8ef1
      turneram authored
      Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension.
      The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes.
      
      Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
      a10a8ef1
  5. 12 Sep, 2022 3 commits
  6. 08 Sep, 2022 2 commits
  7. 07 Sep, 2022 5 commits
  8. 06 Sep, 2022 5 commits
  9. 31 Aug, 2022 3 commits
  10. 30 Aug, 2022 4 commits
  11. 29 Aug, 2022 1 commit
  12. 27 Aug, 2022 2 commits
  13. 26 Aug, 2022 6 commits
  14. 24 Aug, 2022 1 commit
  15. 23 Aug, 2022 1 commit
    • Charlie Lin's avatar
      Dynamic ref NMS (#1288) · fa3c21fa
      Charlie Lin authored
      Has NMS op output a dynamic shape (ONNX spec behavior)
      Allows for dynamic input shape to NMS op
      fa3c21fa