Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
90dfe3de
"examples/vscode:/vscode.git/clone" did not exist on "2b04ec2ff7270d2044410378b04d85a194fa3d4a"
Unverified
Commit
90dfe3de
authored
Sep 05, 2025
by
Kaixi Hou
Committed by
GitHub
Sep 06, 2025
Browse files
[NVIDIA] disable chunked prefix cache when dp and blackwell is used (#9861)
parent
9a719b7a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
0 deletions
+11
-0
python/sglang/srt/model_executor/model_runner.py
python/sglang/srt/model_executor/model_runner.py
+11
-0
No files found.
python/sglang/srt/model_executor/model_runner.py
View file @
90dfe3de
...
@@ -525,6 +525,17 @@ class ModelRunner:
...
@@ -525,6 +525,17 @@ class ModelRunner:
if
not
self
.
use_mla_backend
:
if
not
self
.
use_mla_backend
:
server_args
.
disable_chunked_prefix_cache
=
True
server_args
.
disable_chunked_prefix_cache
=
True
# TODO(kaixih@nvidia): remove this once we have a better solution for DP attention.
# For more details, see: https://github.com/sgl-project/sglang/issues/8616
elif
(
self
.
dp_size
>
1
and
is_sm100_supported
()
and
server_args
.
attention_backend
!=
"triton"
):
logger
.
info
(
"Disable chunked prefix cache when dp size > 1 and attention backend is not triton."
)
server_args
.
disable_chunked_prefix_cache
=
True
if
not
server_args
.
disable_chunked_prefix_cache
:
if
not
server_args
.
disable_chunked_prefix_cache
:
logger
.
info
(
"Chunked prefix cache is turned on."
)
logger
.
info
(
"Chunked prefix cache is turned on."
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment