Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
f296a196
Unverified
Commit
f296a196
authored
Mar 13, 2026
by
Thomas Parnell
Committed by
GitHub
Mar 13, 2026
Browse files
[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (#36876)
parent
bc2c0c86
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
2 deletions
+8
-2
vllm/model_executor/models/qwen3_next.py
vllm/model_executor/models/qwen3_next.py
+8
-2
No files found.
vllm/model_executor/models/qwen3_next.py
View file @
f296a196
...
...
@@ -137,7 +137,7 @@ def fi_chunk_gated_delta_rule(
fi_state
=
initial_state
.
to
(
torch
.
float32
)
fi_g
=
g
.
to
(
torch
.
float32
)
fi_beta
=
beta
.
to
(
torch
.
float32
)
output
,
final_state
=
chunk_gated_delta_rule_fi
(
result
=
chunk_gated_delta_rule_fi
(
q
=
q
,
k
=
k
,
v
=
v
,
...
...
@@ -147,8 +147,14 @@ def fi_chunk_gated_delta_rule(
output_final_state
=
output_final_state
,
cu_seqlens
=
cu_seqlens
,
)
# FlashInfer returns (output, state) when output_final_state=True,
# or just output when output_final_state=False.
# Unsqueeze back to 4D (1, L, H, D) to match fla output format
return
output
.
unsqueeze
(
0
),
final_state
if
output_final_state
:
output
,
final_state
=
result
return
output
.
unsqueeze
(
0
),
final_state
else
:
return
result
.
unsqueeze
(
0
),
None
@
CustomOp
.
register
(
"chunk_gated_delta_rule"
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment