[Frontend] Skip unnecessary detokenization when token_id is requested (#24236)

Signed-off-by: NickLucche <nlucches@redhat.com>

[Frontend] Skip unnecessary detokenization when token_id is requested (#24236)
Signed-off-by: NickLucche <nlucches@redhat.com>
65e03893 · Nicolò Lucchesi · GitHub · 886ccbe5 · 65e03893
Unverified Commit 65e03893 authored Sep 05, 2025 by Nicolò Lucchesi Committed by GitHub Sep 04, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

vllm/entrypoints/openai/serving_chat.py vllm/entrypoints/openai/serving_chat.py +2 -1

No files found.
--- a/vllm/entrypoints/openai/serving_chat.py
+++ b/vllm/entrypoints/openai/serving_chat.py
@@ -1419,9 +1419,10 @@ class OpenAIServingChat(OpenAIServing):
            step_top_logprobs = top_logprobs[i]
            if step_top_logprobs is None or step_top_logprobs.get(
                    token_id) is None:
-                token = tokenizer.decode(token_id)
                if should_return_as_token_id:
                    token = f"token_id:{token_id}"
+                else:
+                    token = tokenizer.decode(token_id)
                logprobs_content.append(
                    ChatCompletionLogProbsContent(