[Bugfix] Remove contiguous output req for context parallel MLA (#25414)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

[Bugfix] Remove contiguous output req for context parallel MLA (#25414)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
78237e43 · Michael Goin · GitHub · eea17839 · 78237e43
Unverified Commit 78237e43 authored Sep 22, 2025 by Michael Goin Committed by GitHub Sep 22, 2025
Show whitespace changes
Inline Side-by-side

Showing with 0 additions and 1 deletion

vllm/attention/ops/common.py vllm/attention/ops/common.py +0 -1

No files found.
--- a/vllm/attention/ops/common.py
+++ b/vllm/attention/ops/common.py
@@ -134,6 +134,5 @@ def cp_lse_ag_out_rs(cp_attn_out: torch.Tensor,
    cp_attn_lse = cp_attn_lse.contiguous()
    lses = cp_group.all_gather(cp_attn_lse, dim=0).view_as(lses)
    out, _ = correct_attn_out(cp_attn_out, lses, cp_group.rank_in_group, ctx)
-    assert out.is_contiguous()
    out = cp_group.reduce_scatter(out, dim=1)
    return out