Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
46ab154b
Commit
46ab154b
authored
Mar 17, 2026
by
王敏
Browse files
[perf]消除sparse mla build时的拷贝调度空泡
parent
efa6bed2
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
9 additions
and
2 deletions
+9
-2
vllm/v1/attention/backends/mla/flashmla_sparse.py
vllm/v1/attention/backends/mla/flashmla_sparse.py
+9
-2
No files found.
vllm/v1/attention/backends/mla/flashmla_sparse.py
View file @
46ab154b
...
...
@@ -456,6 +456,12 @@ class FlashMLASparseMetadataBuilder(AttentionMetadataBuilder[FlashMLASparseMetad
dtype
=
torch
.
int32
,
device
=
device
,
)
self
.
req_id_per_token_buffer_cpu
=
torch
.
zeros
((
vllm_config
.
scheduler_config
.
max_num_batched_tokens
,),
dtype
=
torch
.
int32
,
device
=
"cpu"
,
pin_memory
=
True
)
self
.
req_id_per_token_buffer_np
=
self
.
req_id_per_token_buffer_cpu
.
numpy
()
def
_build_fp8_mixed_decode_prefill
(
self
,
...
...
@@ -651,9 +657,10 @@ class FlashMLASparseMetadataBuilder(AttentionMetadataBuilder[FlashMLASparseMetad
)
# Zero-fill for cudagraphs
self
.
req_id_per_token_buffer
.
fill_
(
0
)
self
.
req_id_per_token_buffer_np
[:
req_id_per_token
.
shape
[
0
]]
=
req_id_per_token
self
.
req_id_per_token_buffer
[:
req_id_per_token
.
shape
[
0
]].
copy_
(
torch
.
from_numpy
(
req_id_per_token
)
,
non_blocking
=
True
)
self
.
req_id_per_token_buffer_cpu
[:
req_id_per_token
.
shape
[
0
]]
,
non_blocking
=
True
)
req_id_per_token
=
self
.
req_id_per_token_buffer
[:
num_tokens
]
fp8_extra_metadata
:
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment