vllm fix check on max vocab size (#22471)

Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me>

vllm fix check on max vocab size (#22471)
Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me>
81eea3d3 · Xiaodong Wang · GitHub · 9701352e · 81eea3d3
Unverified Commit 81eea3d3 authored Aug 31, 2025 by Xiaodong Wang Committed by GitHub Aug 31, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 13 additions and 1 deletion

vllm/v1/engine/processor.py vllm/v1/engine/processor.py +13 -1

No files found.
--- a/vllm/v1/engine/processor.py
+++ b/vllm/v1/engine/processor.py
@@ -470,7 +470,19 @@ class Processor:
        else:
            tokenizer = self.tokenizer.get_lora_tokenizer(lora_request)
            max_input_id = max(prompt_ids, default=0)
-            if max_input_id > tokenizer.max_token_id:
+
+            # NOTE: tokenizer.max_token_id is the tokenizer’s vocab size while
+            # self.model_config.get_vocab_size() is the model’s vocab size.
+            # For Qwen3 models, the language model has extra tokens that do
+            # not exist in the tokenizer, and vice versa for multimodal
+            # placeholder tokens in some multimodal models.
+            # See https://github.com/QwenLM/Qwen3/issues/29#issuecomment-1933720399 # noqa: E501
+            # and https://github.com/vllm-project/vllm/pull/22471#discussion_r2312251421 # noqa: E501
+
+            # Here we take the max of the two to determine if a token id is
+            # truly out-of-vocabulary.
+            if max_input_id > max(tokenizer.max_token_id,
+                                  self.model_config.get_vocab_size() - 1):
                raise ValueError(
                    f"Token id {max_input_id} is out of vocabulary")