"docs/vscode:/vscode.git/clone" did not exist on "4b1961e26e33f3e3ef516f042ffd178f7cd53529"
  • turneram's avatar
    Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) · a10a8ef1
    turneram authored
    Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension.
    The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes.
    
    Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
    a10a8ef1
simplify_reshapes.cpp 27.7 KB