Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
63c13a2c
Unverified
Commit
63c13a2c
authored
Apr 27, 2025
by
Kyungmin Lee
Committed by
GitHub
Apr 26, 2025
Browse files
fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512 (#5733)
parent
4d1e52ab
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
3 deletions
+7
-3
python/sglang/srt/layers/rotary_embedding.py
python/sglang/srt/layers/rotary_embedding.py
+7
-3
No files found.
python/sglang/srt/layers/rotary_embedding.py
View file @
63c13a2c
...
...
@@ -14,8 +14,6 @@ _is_cuda = is_cuda()
if
_is_cuda
:
from
sgl_kernel
import
apply_rope_with_cos_sin_cache_inplace
else
:
from
vllm._custom_ops
import
rotary_embedding
as
vllm_rotary_embedding
def
_rotate_neox
(
x
:
torch
.
Tensor
)
->
torch
.
Tensor
:
...
...
@@ -84,6 +82,12 @@ class RotaryEmbedding(CustomOp):
# NOTE(ByronHsu): cache needs to be in FP32 for numerical stability
if
not
_is_cuda
:
cache
=
cache
.
to
(
dtype
)
if
not
_is_cuda
or
self
.
head_size
not
in
[
64
,
128
,
256
,
512
]:
from
vllm._custom_ops
import
rotary_embedding
self
.
vllm_rotary_embedding
=
rotary_embedding
self
.
cos_sin_cache
:
torch
.
Tensor
self
.
register_buffer
(
"cos_sin_cache"
,
cache
,
persistent
=
False
)
...
...
@@ -160,7 +164,7 @@ class RotaryEmbedding(CustomOp):
)
else
:
self
.
cos_sin_cache
=
self
.
cos_sin_cache
.
to
(
query
.
device
,
dtype
=
query
.
dtype
)
vllm_rotary_embedding
(
self
.
vllm_rotary_embedding
(
positions
,
query
,
key
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment