Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
6d4e27ce
Unverified
Commit
6d4e27ce
authored
Feb 12, 2026
by
Michael Goin
Committed by
GitHub
Feb 12, 2026
Browse files
[Bugfix] Enforce DeepGEMM when using sparse_attn_indexer on CUDA (#34374)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
4c078fa5
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
0 deletions
+5
-0
vllm/model_executor/layers/sparse_attn_indexer.py
vllm/model_executor/layers/sparse_attn_indexer.py
+5
-0
No files found.
vllm/model_executor/layers/sparse_attn_indexer.py
View file @
6d4e27ce
...
...
@@ -10,6 +10,7 @@ from vllm.logger import init_logger
from
vllm.model_executor.custom_op
import
CustomOp
from
vllm.platforms
import
current_platform
from
vllm.utils.deep_gemm
import
fp8_mqa_logits
,
fp8_paged_mqa_logits
from
vllm.utils.import_utils
import
has_deep_gemm
from
vllm.utils.torch_utils
import
direct_register_custom_op
from
vllm.v1.attention.backends.mla.indexer
import
(
DeepseekV32IndexerMetadata
,
...
...
@@ -277,6 +278,10 @@ class SparseAttnIndexer(CustomOp):
self
.
max_model_len
=
max_model_len
self
.
max_total_seq_len
=
max_total_seq_len
self
.
topk_indices_buffer
=
topk_indices_buffer
if
current_platform
.
is_cuda
()
and
not
has_deep_gemm
():
raise
RuntimeError
(
"Sparse Attention Indexer CUDA op requires DeepGEMM to be installed."
)
def
forward_native
(
self
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment