Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
c462f3a0
Commit
c462f3a0
authored
Apr 17, 2026
by
wanghl6
Browse files
[FIX]减少mqa_logits显存占用
parent
3c0e74be
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
7 deletions
+1
-7
vllm/model_executor/layers/sparse_attn_indexer.py
vllm/model_executor/layers/sparse_attn_indexer.py
+1
-7
No files found.
vllm/model_executor/layers/sparse_attn_indexer.py
View file @
c462f3a0
...
@@ -51,13 +51,7 @@ def sparse_attn_indexer(
...
@@ -51,13 +51,7 @@ def sparse_attn_indexer(
# careful! this will be None in dummy run
# careful! this will be None in dummy run
attn_metadata
=
get_forward_context
().
attn_metadata
attn_metadata
=
get_forward_context
().
attn_metadata
fp8_dtype
=
current_platform
.
fp8_dtype
()
fp8_dtype
=
current_platform
.
fp8_dtype
()
if
q_fp8
.
dtype
==
fp8_dtype
:
MAX_ELEMENTS
=
16384
*
16384
MAX_ELEMENTS
=
65536
*
65536
elif
q_fp8
.
dtype
in
(
torch
.
bfloat16
,
torch
.
float16
):
MAX_ELEMENTS
=
16384
*
32768
else
:
MAX_ELEMENTS
=
16384
*
32768
device
=
q_fp8
.
device
device
=
q_fp8
.
device
if
device
not
in
_GLOBAL_LOGITS_BUFFERS
or
_GLOBAL_LOGITS_BUFFERS
[
device
].
numel
()
<
MAX_ELEMENTS
:
if
device
not
in
_GLOBAL_LOGITS_BUFFERS
or
_GLOBAL_LOGITS_BUFFERS
[
device
].
numel
()
<
MAX_ELEMENTS
:
_GLOBAL_LOGITS_BUFFERS
[
device
]
=
torch
.
empty
(
_GLOBAL_LOGITS_BUFFERS
[
device
]
=
torch
.
empty
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment