Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
3fa3c22a
Unverified
Commit
3fa3c22a
authored
Sep 19, 2025
by
Baizhou Zhang
Committed by
GitHub
Sep 19, 2025
Browse files
Fix fast decode plan for flashinfer v0.4.0rc1 and upgrade sgl-kernel 0.3.11 (#10634)
Co-authored-by:
zhyncs
<
me@zhyncs.com
>
parent
4f2055ad
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
10 additions
and
7 deletions
+10
-7
docker/Dockerfile
docker/Dockerfile
+1
-1
python/pyproject.toml
python/pyproject.toml
+2
-2
python/pyproject_other.toml
python/pyproject_other.toml
+2
-2
python/sglang/srt/entrypoints/engine.py
python/sglang/srt/entrypoints/engine.py
+2
-2
python/sglang/srt/layers/attention/flashinfer_backend.py
python/sglang/srt/layers/attention/flashinfer_backend.py
+3
-0
No files found.
docker/Dockerfile
View file @
3fa3c22a
...
...
@@ -85,7 +85,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip setuptools wheel html5li
&&
python3
-m
pip
install
--no-cache-dir
nvidia-nccl-cu12
==
2.27.6
--force-reinstall
--no-deps
\
&&
python3
-m
flashinfer
--download-cubin
\
&&
if
[
"
$CUDA_VERSION
"
=
"12.6.1"
]
;
then
\
python3
-m
pip
install
--no-cache-dir
https://github.com/sgl-project/whl/releases/download/v0.3.1
0
/sgl_kernel-0.3.1
0
+cu124-cp310-abi3-manylinux2014_x86_64.whl
--force-reinstall
--no-deps
;
\
python3
-m
pip
install
--no-cache-dir
https://github.com/sgl-project/whl/releases/download/v0.3.1
1
/sgl_kernel-0.3.1
1
+cu124-cp310-abi3-manylinux2014_x86_64.whl
--force-reinstall
--no-deps
;
\
fi
# Download source files
...
...
python/pyproject.toml
View file @
3fa3c22a
...
...
@@ -57,12 +57,12 @@ dependencies = [
"uvicorn"
,
"uvloop"
,
"xgrammar==0.1.24"
,
"sgl-kernel==0.3.1
0
"
,
"sgl-kernel==0.3.1
1
"
,
"torch==2.8.0"
,
"torchaudio==2.8.0"
,
"torchvision"
,
"cuda-python"
,
"flashinfer_python==0.
3.
1"
,
"flashinfer_python==0.
4.0rc
1"
,
"openai==1.99.1"
,
"tiktoken"
,
"anthropic>=0.20.0"
,
...
...
python/pyproject_other.toml
View file @
3fa3c22a
...
...
@@ -65,7 +65,7 @@ tracing = [
srt
=
[
"sglang[runtime_common]"
,
"sgl-kernel==0.3.1
0
"
,
"sgl-kernel==0.3.1
1
"
,
"torch==2.8.0"
,
"torchaudio==2.8.0"
,
"torchvision"
,
...
...
@@ -75,7 +75,7 @@ srt = [
blackwell
=
[
"sglang[runtime_common]"
,
"sgl-kernel==0.3.1
0
"
,
"sgl-kernel==0.3.1
1
"
,
"torch==2.8.0"
,
"torchaudio==2.8.0"
,
"torchvision"
,
...
...
python/sglang/srt/entrypoints/engine.py
View file @
3fa3c22a
...
...
@@ -703,7 +703,7 @@ def _set_envs_and_config(server_args: ServerArgs):
if
server_args
.
attention_backend
==
"flashinfer"
:
assert_pkg_version
(
"flashinfer_python"
,
"0.
3.
1"
,
"0.
4.0rc
1"
,
"Please uninstall the old version and "
"reinstall the latest version by following the instructions "
"at https://docs.flashinfer.ai/installation.html."
,
...
...
@@ -711,7 +711,7 @@ def _set_envs_and_config(server_args: ServerArgs):
if
_is_cuda
and
not
get_bool_env_var
(
"SGLANG_SKIP_SGL_KERNEL_VERSION_CHECK"
):
assert_pkg_version
(
"sgl-kernel"
,
"0.3.1
0
"
,
"0.3.1
1
"
,
"Please reinstall the latest version with `pip install sgl-kernel --force-reinstall`"
,
)
...
...
python/sglang/srt/layers/attention/flashinfer_backend.py
View file @
3fa3c22a
...
...
@@ -1432,6 +1432,9 @@ def fast_decode_plan(
head_dim
,
head_dim
,
False
,
# causal
window_left
,
-
1
,
False
,
)
except
Exception
as
e
:
raise
RuntimeError
(
f
"Error in standard plan:
{
e
}
"
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment