Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
aae725af
Unverified
Commit
aae725af
authored
Sep 15, 2025
by
Alexander Matveev
Committed by
GitHub
Sep 15, 2025
Browse files
[Performance] Remove redundant clone() calls in cutlass_mla (#24891)
parent
73df49ef
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
8 deletions
+8
-8
vllm/v1/attention/backends/mla/cutlass_mla.py
vllm/v1/attention/backends/mla/cutlass_mla.py
+8
-8
No files found.
vllm/v1/attention/backends/mla/cutlass_mla.py
View file @
aae725af
...
...
@@ -210,9 +210,14 @@ class CutlassMLAImpl(MLACommonImpl[MLACommonMetadata]):
sm_scale
,
num_kv_splits
,
)
returned_lse
=
lse
[:,
:
H
].
contiguous
(
)
if
self
.
need_to_return_lse_for_decode
else
lse
return
out
[:,
:
H
].
contiguous
(),
returned_lse
if
H
<
MAX_HEADS
:
# Extract the subsets of the outputs
returned_lse
=
lse
[:,
:
H
].
contiguous
(
)
if
self
.
need_to_return_lse_for_decode
else
lse
out
=
out
[:,
:
H
]
return
out
,
returned_lse
def
_sm100_forward_decode
(
self
,
...
...
@@ -228,11 +233,6 @@ class CutlassMLAImpl(MLACommonImpl[MLACommonMetadata]):
self
.
_workspace
.
ensure_size
(
attn_metadata
,
self
.
_num_kv_splits
)
# Run MLA
# Clone q_nope and q_pe to make sure strides computation is correct.
# TODO: Check if we really need it
q_nope
=
q_nope
.
clone
()
q_pe
=
q_pe
.
clone
()
o
,
lse
=
self
.
_sm100_cutlass_mla_decode
(
q_nope
,
q_pe
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment