Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
de15d140
Unverified
Commit
de15d140
authored
Sep 11, 2025
by
Yineng Zhang
Committed by
GitHub
Sep 11, 2025
Browse files
Revert "Fix flashinfer version in sgl-kernel (#10135)" (#10310)
parent
37367da6
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
2 additions
and
6 deletions
+2
-6
python/sglang/srt/layers/attention/flashinfer_backend.py
python/sglang/srt/layers/attention/flashinfer_backend.py
+1
-5
sgl-kernel/CMakeLists.txt
sgl-kernel/CMakeLists.txt
+1
-1
No files found.
python/sglang/srt/layers/attention/flashinfer_backend.py
View file @
de15d140
...
@@ -1187,7 +1187,7 @@ class FlashInferMultiStepDraftBackend:
...
@@ -1187,7 +1187,7 @@ class FlashInferMultiStepDraftBackend:
def
init_cuda_graph_state
(
self
,
max_bs
:
int
,
max_num_tokens
:
int
):
def
init_cuda_graph_state
(
self
,
max_bs
:
int
,
max_num_tokens
:
int
):
self
.
cuda_graph_kv_indices
=
torch
.
zeros
(
self
.
cuda_graph_kv_indices
=
torch
.
zeros
(
(
self
.
speculative_num_steps
,
max_bs
*
self
.
topk
*
self
.
max_context_len
),
(
self
.
speculative_num_steps
,
max_bs
*
self
.
max_context_len
),
dtype
=
torch
.
int32
,
dtype
=
torch
.
int32
,
device
=
"cuda"
,
device
=
"cuda"
,
)
)
...
@@ -1349,10 +1349,6 @@ def fast_decode_plan(
...
@@ -1349,10 +1349,6 @@ def fast_decode_plan(
self
.
device
,
non_blocking
=
non_blocking
self
.
device
,
non_blocking
=
non_blocking
)
)
# TODO:
# We want to cache `empty_q_data`, `empty_kv_cache`, `last_page_len_host` (if it is ones) in the wrapper
# so that we do not need to create them every time.
# Create empty tensors for dtype info if needed
# Create empty tensors for dtype info if needed
empty_q_data
=
torch
.
empty
(
empty_q_data
=
torch
.
empty
(
0
,
0
,
...
...
sgl-kernel/CMakeLists.txt
View file @
de15d140
...
@@ -81,7 +81,7 @@ FetchContent_Populate(repo-triton)
...
@@ -81,7 +81,7 @@ FetchContent_Populate(repo-triton)
FetchContent_Declare
(
FetchContent_Declare
(
repo-flashinfer
repo-flashinfer
GIT_REPOSITORY https://github.com/flashinfer-ai/flashinfer.git
GIT_REPOSITORY https://github.com/flashinfer-ai/flashinfer.git
GIT_TAG
1a85c439a064c1609568675aa580a409a53fb18
3
GIT_TAG
018b551825c8e5579206e6eb9d3229fa679202b
3
GIT_SHALLOW OFF
GIT_SHALLOW OFF
)
)
FetchContent_Populate
(
repo-flashinfer
)
FetchContent_Populate
(
repo-flashinfer
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment