Unverified Commit fedb75fa authored by Alexander Matveev's avatar Alexander Matveev Committed by GitHub
Browse files

[Bugfix][B200] Fix `cutlass_mla` hang (#24966)


Signed-off-by: default avatarAlexander Matveev <amatveev@redhat.com>
Co-authored-by: default avatarMichael Goin <mgoin64@gmail.com>
parent bff2e5f1
......@@ -133,6 +133,14 @@ public:
// printf(" sm_count = %d\n", sm_count);
int max_splits = ceil_div(K, 128);
max_splits = min(16, max_splits);
// TODO: This avoids a hang when the batch size larger than 1 and
// there is more than 4 kv_splits.
// Discuss with NVIDIA how this can be fixed.
if (B > 1) {
max_splits = min(2, max_splits);
}
// printf(" max_splits = %d\n", max_splits);
int sms_per_batch = max(1, sm_count / B);
// printf(" sms_per_batch = %d\n", sms_per_batch);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment