Commit 3dad13fb authored by gaoqiong's avatar gaoqiong
Browse files

更改默认的full _cuda_graph启动方式为false

parent 04b61f0e
...@@ -4106,7 +4106,7 @@ class CompilationConfig: ...@@ -4106,7 +4106,7 @@ class CompilationConfig:
are always used, it can set this to False. Otherwise, it should are always used, it can set this to False. Otherwise, it should
set this to True, and the compiler will copy the input to an set this to True, and the compiler will copy the input to an
internally managed buffer. Default is False.""" internally managed buffer. Default is False."""
full_cuda_graph: bool = True full_cuda_graph: bool = False
"""whether to use a full cuda graph for the entire forward pass rather than """whether to use a full cuda graph for the entire forward pass rather than
splitting certain operations such as attention into subgraphs. Thus this splitting certain operations such as attention into subgraphs. Thus this
flag cannot be used together with splitting_ops. This may provide flag cannot be used together with splitting_ops. This may provide
...@@ -4948,4 +4948,4 @@ def get_layers_from_vllm_config(vllm_config: VllmConfig, ...@@ -4948,4 +4948,4 @@ def get_layers_from_vllm_config(vllm_config: VllmConfig,
for layer_name, layer in for layer_name, layer in
vllm_config.compilation_config.static_forward_context.items() vllm_config.compilation_config.static_forward_context.items()
if isinstance(layer, layer_type) if isinstance(layer, layer_type)
} }
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment