Unverified Commit 53a0cf8b authored by Liangfu Chen's avatar Liangfu Chen Committed by GitHub
Browse files

[Neuron] trim attention kernel tests to fit trn1.2x instance (#14988)


Signed-off-by: default avatarLiangfu Chen <liangfc@amazon.com>
parent 5eeabc2a
......@@ -314,7 +314,7 @@ def get_active_block_tables(block_tables, query_lens, seq_lens, block_size,
# Test edge cases
(1, 128, 16, 1024, 4, 2, 16, False), # large decode batch
(16, 4, 8, 8192, 48, 1, 128, True), # large prefill batch
(16, 4, 8, 1024, 4, 2, 128, True), # large prefill batch
(4, 12, 32, 2048, 16, 1, 32, True), # multi-head attention (MHA)
(4, 12, 32, 2048, 16, 16, 32, True), # multi-query attention (MQA)
])
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment