Unverified Commit 0fefd00e authored by Sage's avatar Sage Committed by GitHub
Browse files

[Bugfix] Fix render server crash for quantized models on CPU-only hosts (#37215)


Signed-off-by: default avatarSage Ahrac <sagiahrak@gmail.com>
parent f5c081d4
...@@ -116,6 +116,11 @@ async def run_launch_fastapi(args: argparse.Namespace) -> None: ...@@ -116,6 +116,11 @@ async def run_launch_fastapi(args: argparse.Namespace) -> None:
# 2. Build and serve the API server # 2. Build and serve the API server
engine_args = AsyncEngineArgs.from_cli_args(args) engine_args = AsyncEngineArgs.from_cli_args(args)
model_config = engine_args.create_model_config() model_config = engine_args.create_model_config()
# Render servers preprocess data only — no inference, no quantized kernels.
# Clear quantization so VllmConfig skips quant dtype/capability validation.
model_config.quantization = None
vllm_config = VllmConfig(model_config=model_config) vllm_config = VllmConfig(model_config=model_config)
shutdown_task = await build_and_serve_renderer( shutdown_task = await build_and_serve_renderer(
vllm_config, listen_address, sock, args vllm_config, listen_address, sock, args
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment