1. 06 Oct, 2022 1 commit
  2. 03 Oct, 2022 1 commit
  3. 29 Sep, 2022 3 commits
  4. 28 Sep, 2022 1 commit
    • Umang Yadav's avatar
      Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960
      Umang Yadav authored
      test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.
      70e63960
  5. 27 Sep, 2022 1 commit
  6. 26 Sep, 2022 3 commits
  7. 24 Sep, 2022 2 commits
  8. 23 Sep, 2022 1 commit
  9. 21 Sep, 2022 2 commits
  10. 19 Sep, 2022 4 commits
  11. 16 Sep, 2022 7 commits
  12. 15 Sep, 2022 2 commits
  13. 14 Sep, 2022 4 commits
  14. 13 Sep, 2022 1 commit
    • turneram's avatar
      Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) · a10a8ef1
      turneram authored
      Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension.
      The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes.
      
      Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
      a10a8ef1
  15. 09 Sep, 2022 1 commit
  16. 08 Sep, 2022 2 commits
  17. 07 Sep, 2022 1 commit
  18. 06 Sep, 2022 1 commit
  19. 31 Aug, 2022 1 commit
  20. 29 Aug, 2022 1 commit