docs: update the doc for vLLM agg serving to fit XPU (#7078)

Signed-off-by: Yi Yao <yi.a.yao@intel.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

docs: update the doc for vLLM agg serving to fit XPU (#7078)
Signed-off-by: Yi Yao <yi.a.yao@intel.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
70e1ec1d · Yi Yao · GitHub · 1368ccd6 · 70e1ec1d
Unverified Commit 70e1ec1d authored Mar 25, 2026 by Yi Yao Committed by GitHub Mar 25, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 0 deletions

docs/backends/vllm/vllm-examples.md docs/backends/vllm/vllm-examples.md +8 -0

No files found.
--- a/docs/backends/vllm/vllm-examples.md
+++ b/docs/backends/vllm/vllm-examples.md
@@ -45,6 +45,14 @@ cd $DYNAMO_HOME/examples/backends/vllm
 bash launch/agg.sh
 ```
+For XPU deployments, use a larger block size and set it to at least `64` (`>= 64`):
+```bash
+# XeTLA ChunkPrefill FP8KV: only support block_size >= 64
+cd $DYNAMO_HOME/examples/backends/vllm
+bash launch/agg.sh --block-size 64
+```
 ### Aggregated Serving with KV Routing
 Two workers behind a [KV-aware router](../../components/router/README.md) that maximizes cache reuse. Requires 2 GPUs.