Unverified Commit 70e1ec1d authored by Yi Yao's avatar Yi Yao Committed by GitHub
Browse files

docs: update the doc for vLLM agg serving to fit XPU (#7078)


Signed-off-by: default avatarYi Yao <yi.a.yao@intel.com>
Co-authored-by: default avatarRyan McCormick <rmccormick@nvidia.com>
parent 1368ccd6
...@@ -45,6 +45,14 @@ cd $DYNAMO_HOME/examples/backends/vllm ...@@ -45,6 +45,14 @@ cd $DYNAMO_HOME/examples/backends/vllm
bash launch/agg.sh bash launch/agg.sh
``` ```
For XPU deployments, use a larger block size and set it to at least `64` (`>= 64`):
```bash
# XeTLA ChunkPrefill FP8KV: only support block_size >= 64
cd $DYNAMO_HOME/examples/backends/vllm
bash launch/agg.sh --block-size 64
```
### Aggregated Serving with KV Routing ### Aggregated Serving with KV Routing
Two workers behind a [KV-aware router](../../components/router/README.md) that maximizes cache reuse. Requires 2 GPUs. Two workers behind a [KV-aware router](../../components/router/README.md) that maximizes cache reuse. Requires 2 GPUs.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment