Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
d7a4f220
Unverified
Commit
d7a4f220
authored
Nov 11, 2024
by
Woosuk Kwon
Committed by
GitHub
Nov 11, 2024
Browse files
[V1] Do not use inductor for piecewise CUDA graphs (#10225)
Signed-off-by:
Woosuk Kwon
<
woosuk.kwon@berkeley.edu
>
parent
f9dadfbe
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
4 deletions
+3
-4
vllm/v1/worker/gpu_model_runner.py
vllm/v1/worker/gpu_model_runner.py
+3
-4
No files found.
vllm/v1/worker/gpu_model_runner.py
View file @
d7a4f220
...
...
@@ -404,15 +404,14 @@ class GPUModelRunner:
def
load_model
(
self
)
->
None
:
if
self
.
use_cuda_graph
:
# FIXME(woosuk): Currently, the custom ops are not supported
# in the piecewise compilation mode. We rely on TorchInductor
# to optimize the model.
# FIXME(woosuk): Currently, we do not use inductor to reduce the
# compilation time and any potential issues with the inductor.
os
.
environ
[
"VLLM_CUSTOM_OPS"
]
=
"none"
set_compilation_config
(
CompilationConfig
(
use_cudagraph
=
True
,
non_cudagraph_ops
=
[
"vllm.unified_v1_flash_attention"
],
use_inductor
=
Tru
e
,
use_inductor
=
Fals
e
,
))
logger
.
info
(
"Starting to load model %s..."
,
self
.
model_config
.
model
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment