Unverified Commit 195d1ca3 authored by Woosuk Kwon's avatar Woosuk Kwon Committed by GitHub
Browse files

[Minor] Enhance error message for TRTLLM decode uniformity check (#36609)


Signed-off-by: default avatarWoosuk Kwon <woosuk@inferact.ai>
parent 8d983d7c
...@@ -1110,7 +1110,8 @@ class FlashInferMetadataBuilder(AttentionMetadataBuilder[FlashInferMetadata]): ...@@ -1110,7 +1110,8 @@ class FlashInferMetadataBuilder(AttentionMetadataBuilder[FlashInferMetadata]):
if num_decodes > 0: if num_decodes > 0:
if decode_use_trtllm: if decode_use_trtllm:
assert num_decode_tokens % num_decodes == 0, ( assert num_decode_tokens % num_decodes == 0, (
"TRTLLM decode requires uniform query lengths per request." "TRTLLM decode requires uniform query lengths per request. "
f"Got {num_decode_tokens=} and {num_decodes=}."
) )
attn_metadata.decode = TRTLLMDecode( attn_metadata.decode = TRTLLMDecode(
block_tables=block_table_tensor[:num_decodes], block_tables=block_table_tensor[:num_decodes],
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment