[Neuron] trim attention kernel tests to fit trn1.2x instance (#14988)

Signed-off-by: Liangfu Chen <liangfc@amazon.com>

[Neuron] trim attention kernel tests to fit trn1.2x instance (#14988)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
53a0cf8b · Liangfu Chen · GitHub · 5eeabc2a · 53a0cf8b
Unverified Commit 53a0cf8b authored Mar 18, 2025 by Liangfu Chen Committed by GitHub Mar 18, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

tests/neuron/1_core/test_prefix_prefill.py tests/neuron/1_core/test_prefix_prefill.py +1 -1

No files found.
--- a/tests/neuron/1_core/test_prefix_prefill.py
+++ b/tests/neuron/1_core/test_prefix_prefill.py
@@ -314,7 +314,7 @@ def get_active_block_tables(block_tables, query_lens, seq_lens, block_size,

        # Test edge cases
        (1, 128, 16, 1024, 4, 2, 16, False),  # large decode batch
-        (16, 4, 8, 8192, 48, 1, 128, True),  # large prefill batch
+        (16, 4, 8, 1024, 4, 2, 128, True),  # large prefill batch
        (4, 12, 32, 2048, 16, 1, 32, True),  # multi-head attention (MHA)
        (4, 12, 32, 2048, 16, 16, 32, True),  # multi-query attention (MQA)
    ])