Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
0fefd00e
Unverified
Commit
0fefd00e
authored
Mar 16, 2026
by
Sage
Committed by
GitHub
Mar 16, 2026
Browse files
[Bugfix] Fix render server crash for quantized models on CPU-only hosts (#37215)
Signed-off-by:
Sage Ahrac
<
sagiahrak@gmail.com
>
parent
f5c081d4
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
0 deletions
+5
-0
vllm/entrypoints/cli/launch.py
vllm/entrypoints/cli/launch.py
+5
-0
No files found.
vllm/entrypoints/cli/launch.py
View file @
0fefd00e
...
@@ -116,6 +116,11 @@ async def run_launch_fastapi(args: argparse.Namespace) -> None:
...
@@ -116,6 +116,11 @@ async def run_launch_fastapi(args: argparse.Namespace) -> None:
# 2. Build and serve the API server
# 2. Build and serve the API server
engine_args
=
AsyncEngineArgs
.
from_cli_args
(
args
)
engine_args
=
AsyncEngineArgs
.
from_cli_args
(
args
)
model_config
=
engine_args
.
create_model_config
()
model_config
=
engine_args
.
create_model_config
()
# Render servers preprocess data only — no inference, no quantized kernels.
# Clear quantization so VllmConfig skips quant dtype/capability validation.
model_config
.
quantization
=
None
vllm_config
=
VllmConfig
(
model_config
=
model_config
)
vllm_config
=
VllmConfig
(
model_config
=
model_config
)
shutdown_task
=
await
build_and_serve_renderer
(
shutdown_task
=
await
build_and_serve_renderer
(
vllm_config
,
listen_address
,
sock
,
args
vllm_config
,
listen_address
,
sock
,
args
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment