[4.5/N] bugfix for quant config in speculative decode (#10007)

Signed-off-by: youkaichao <youkaichao@gmail.com>

[4.5/N] bugfix for quant config in speculative decode (#10007)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2094062b · youkaichao · GitHub · d93478b3 · 2094062b
Unverified Commit 2094062b authored Nov 04, 2024 by youkaichao Committed by GitHub Nov 04, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

vllm/spec_decode/spec_decode_worker.py vllm/spec_decode/spec_decode_worker.py +4 -0

No files found.
--- a/vllm/spec_decode/spec_decode_worker.py
+++ b/vllm/spec_decode/spec_decode_worker.py
@@ -61,6 +61,10 @@ def create_spec_worker(*args, **kwargs) -> "SpecDecodeWorker":
    draft_worker_config = copy.deepcopy(vllm_config)
    draft_worker_config.model_config = speculative_config.draft_model_config
+    draft_worker_config.quant_config = VllmConfig._get_quantization_config(
+        draft_worker_config.model_config,
+        vllm_config.load_config,
+    )
    draft_worker_config.parallel_config = speculative_config.draft_parallel_config  # noqa
    # TODO allow draft-model specific load config.