fix(xpu): Re-compute compile ranges after platform-specific config updates (#37523)

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com> Signed-off-by: Yuxiang Liang <yuliang@habana.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix(xpu): Re-compute compile ranges after platform-specific config updates (#37523)
Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com> Signed-off-by: Yuxiang Liang <yuliang@habana.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
638a872d · Yuxiang Liang · GitHub · 9040151f · 638a872d
Unverified Commit 638a872d authored Mar 20, 2026 by Yuxiang Liang Committed by GitHub Mar 20, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

vllm/config/vllm.py vllm/config/vllm.py +4 -2

No files found.
--- a/vllm/config/vllm.py
+++ b/vllm/config/vllm.py
@@ -985,8 +985,6 @@ class VllmConfig:
                "--kv-sharing-fast-prefill requires changes on model side for "
                "correctness and to realize prefill savings."
            )
-        # TODO: Move after https://github.com/vllm-project/vllm/pull/26847 lands
-        self._set_compile_ranges()

        if (
            self.model_config
@@ -1022,6 +1020,10 @@ class VllmConfig:
            )
        current_platform.check_and_update_config(self)

+        # Re-compute compile ranges after platform-specific config updates
+        # (e.g., XPU may lower max_num_batched_tokens when MLA is enabled)
+        self._set_compile_ranges()
+
        # Do this after all the updates to compilation_config.mode
        effective_dp_size = (
            self.parallel_config.data_parallel_size