[Doc] Added warning of speculating with draft model (#22047)

Signed-off-by: Dilute-l <dilu2333@163.com> Co-authored-by: Dilute-l <dilu2333@163.com>

[Doc] Added warning of speculating with draft model (#22047)
Signed-off-by: Dilute-l <dilu2333@163.com> Co-authored-by: Dilute-l <dilu2333@163.com>
49314869 · WeiQing Chen · GitHub · 0f81b310 · 49314869
Unverified Commit 49314869 authored Aug 01, 2025 by WeiQing Chen Committed by GitHub Aug 01, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

docs/features/spec_decode.md docs/features/spec_decode.md +4 -0

No files found.
--- a/docs/features/spec_decode.md
+++ b/docs/features/spec_decode.md
@@ -15,6 +15,10 @@ Speculative decoding is a technique which improves inter-token latency in memory
 The following code configures vLLM in an offline mode to use speculative decoding with a draft model, speculating 5 tokens at a time.
+!!! warning
+    In vllm v0.10.0, speculative decoding with a draft model is not supported.
+    If you use the following code, you will get a `NotImplementedError`.
 ??? code
    ```python