Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
6cc1e7d9
Unverified
Commit
6cc1e7d9
authored
Jul 01, 2025
by
Li, Jiang
Committed by
GitHub
Jul 01, 2025
Browse files
[CPU] Update custom ops for the CPU backend (#20255)
Signed-off-by:
jiang1.li
<
jiang1.li@intel.com
>
parent
9909726d
Changes
23
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
26 additions
and
3 deletions
+26
-3
vllm/model_executor/layers/utils.py
vllm/model_executor/layers/utils.py
+23
-2
vllm/model_executor/layers/vocab_parallel_embedding.py
vllm/model_executor/layers/vocab_parallel_embedding.py
+1
-1
vllm/platforms/cpu.py
vllm/platforms/cpu.py
+2
-0
No files found.
vllm/model_executor/layers/utils.py
View file @
6cc1e7d9
...
...
@@ -63,7 +63,15 @@ def apply_penalties(logits: torch.Tensor, prompt_tokens_tensor: torch.Tensor,
return
logits
def
rocm_unquantized_gemm
(
x
:
torch
.
Tensor
,
def
default_unquantized_gemm
(
layer
:
torch
.
nn
.
Module
,
x
:
torch
.
Tensor
,
weight
:
torch
.
Tensor
,
bias
:
Optional
[
torch
.
Tensor
]
=
None
):
return
torch
.
nn
.
functional
.
linear
(
x
,
weight
,
bias
)
def
rocm_unquantized_gemm
(
layer
:
torch
.
nn
.
Module
,
x
:
torch
.
Tensor
,
weight
:
torch
.
Tensor
,
bias
:
Optional
[
torch
.
Tensor
]
=
None
):
from
vllm.platforms.rocm
import
on_gfx9
...
...
@@ -89,7 +97,20 @@ def rocm_unquantized_gemm(x: torch.Tensor,
return
torch
.
nn
.
functional
.
linear
(
x
,
weight
,
bias
)
def
cpu_unquantized_gemm
(
layer
:
torch
.
nn
.
Module
,
x
:
torch
.
Tensor
,
weight
:
torch
.
Tensor
,
bias
:
Optional
[
torch
.
Tensor
]
=
None
):
if
getattr
(
layer
,
"use_cpu_sgl"
,
False
):
return
torch
.
ops
.
_C
.
weight_packed_linear
(
x
,
weight
,
bias
,
True
)
else
:
return
torch
.
nn
.
functional
.
linear
(
x
,
weight
,
bias
)
def
dispatch_unquantized_gemm
()
->
Callable
[...,
torch
.
Tensor
]:
if
current_platform
.
is_rocm
():
return
rocm_unquantized_gemm
return
torch
.
nn
.
functional
.
linear
elif
current_platform
.
is_cpu
():
return
cpu_unquantized_gemm
else
:
return
default_unquantized_gemm
vllm/model_executor/layers/vocab_parallel_embedding.py
View file @
6cc1e7d9
...
...
@@ -43,7 +43,7 @@ class UnquantizedEmbeddingMethod(QuantizeMethodBase):
layer
:
torch
.
nn
.
Module
,
x
:
torch
.
Tensor
,
bias
:
Optional
[
torch
.
Tensor
]
=
None
)
->
torch
.
Tensor
:
return
dispatch_unquantized_gemm
()(
x
,
layer
.
weight
,
bias
)
return
dispatch_unquantized_gemm
()(
layer
,
x
,
layer
.
weight
,
bias
)
def
embedding
(
self
,
layer
:
torch
.
nn
.
Module
,
input_
:
torch
.
Tensor
)
->
torch
.
Tensor
:
...
...
vllm/platforms/cpu.py
View file @
6cc1e7d9
...
...
@@ -194,6 +194,8 @@ class CpuPlatform(Platform):
"epilogue_fusion"
:
True
,
})
if
compilation_config
.
use_inductor
:
compilation_config
.
custom_ops
=
[
"none"
]
if
vllm_config
.
lora_config
is
not
None
:
compilation_config
.
level
=
CompilationLevel
.
NO_COMPILATION
...
...
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment