"...targets/git@developer.sourcefind.cn:gaoqiong/migraphx.git" did not exist on "59b80d4e8377a346f96137537de5d69c9b6e088d"
-
turneram authored
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension. The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes. Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
a10a8ef1