Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
000cceca
Unverified
Commit
000cceca
authored
Aug 16, 2025
by
Michael Goin
Committed by
GitHub
Aug 16, 2025
Browse files
[Bugfix gpt-oss] Fix float32 convert for flashinfer sink support (#23016)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
68373d31
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
9 additions
and
3 deletions
+9
-3
vllm/attention/layer.py
vllm/attention/layer.py
+9
-0
vllm/v1/attention/backends/flashinfer.py
vllm/v1/attention/backends/flashinfer.py
+0
-3
No files found.
vllm/attention/layer.py
View file @
000cceca
...
...
@@ -308,6 +308,15 @@ class Attention(nn.Module):
if
hasattr
(
self
.
impl
,
"process_weights_after_loading"
):
self
.
impl
.
process_weights_after_loading
(
act_dtype
)
# FlashInfer requires attention sinks to be float32
if
(
self
.
backend
==
_Backend
.
FLASHINFER_VLLM_V1
and
hasattr
(
self
.
impl
,
'sinks'
)):
from
vllm.v1.attention.backends.flashinfer
import
FlashInferImpl
assert
isinstance
(
self
.
impl
,
FlashInferImpl
)
if
(
self
.
impl
.
sinks
is
not
None
and
self
.
impl
.
sinks
.
dtype
!=
torch
.
float32
):
self
.
impl
.
sinks
=
self
.
impl
.
sinks
.
to
(
torch
.
float32
)
def
get_attn_backend
(
self
)
->
type
[
AttentionBackend
]:
return
self
.
attn_backend
...
...
vllm/v1/attention/backends/flashinfer.py
View file @
000cceca
...
...
@@ -642,9 +642,6 @@ class FlashInferImpl(AttentionImpl):
f
"heads in the layer. Expected
{
num_heads
}
, but got "
f
"
{
sinks
.
shape
[
0
]
}
."
)
# Cast sinks to float32 if needed (FlashInfer requirement)
if
sinks
.
dtype
!=
torch
.
float32
:
sinks
=
sinks
.
to
(
torch
.
float32
)
self
.
sinks
=
sinks
def
forward
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment