Unverified Commit 5b55c0be authored by Francesco Fusco's avatar Francesco Fusco Committed by GitHub
Browse files

[Attention] Clarify comment explaining attn_logits +1 dimension (#33427)


Signed-off-by: default avatarFrancesco Fusco <ffu@zurich.ibm.com>
parent 15e0bb9c
...@@ -143,8 +143,8 @@ class TritonMLAImpl(MLACommonImpl[MLACommonMetadata]): ...@@ -143,8 +143,8 @@ class TritonMLAImpl(MLACommonImpl[MLACommonMetadata]):
B, B,
q_num_heads, q_num_heads,
num_kv_splits, num_kv_splits,
# NOTE(lucas) idk why the +1 is here but sglang has it so we # NOTE: the +1 stores the LogSumExp (LSE) that the stage2
# just mirror that # kernel uses to merge partial attention outputs across splits.
self.kv_lora_rank + 1, self.kv_lora_rank + 1,
), ),
dtype=torch.float32, dtype=torch.float32,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment