"src/include/vscode:/vscode.git/clone" did not exist on "6fe85d4356db24ed65f53a291bb88be7ceb28691"
Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354)
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension. The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes. Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
Showing
Please register or sign in to comment