docs: Add cancellation docs for vLLM, TRT-LLM and SGLang backends (#3783)

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

docs: Add cancellation docs for vLLM, TRT-LLM and SGLang backends (#3783)
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
22d22777 · Jacky · GitHub · 8ad3b9a2 · 22d22777 · 22d22777
Unverified Commit 22d22777 authored Oct 21, 2025 by Jacky Committed by GitHub Oct 21, 2025
Showing with 43 additions and 0 deletions

docs/backends/sglang/README.md docs/backends/sglang/README.md +16 -0

docs/backends/trtllm/README.md docs/backends/trtllm/README.md +14 -0

docs/backends/vllm/README.md docs/backends/vllm/README.md +13 -0

No files found.
--- a/docs/backends/sglang/README.md
+++ b/docs/backends/sglang/README.md
@@ -69,6 +69,22 @@ Dynamo SGLang uses SGLang's native argument parser, so **most SGLang engine argu
 > [!NOTE]
 > When using `--use-sglang-tokenizer`, only `v1/chat/completions` is available through Dynamo's frontend.
+### Request Cancellation
+When a user cancels a request (e.g., by disconnecting from the frontend), the request is automatically cancelled across all workers, freeing compute resources for other requests.
+#### Cancellation Support Matrix
+| | Prefill | Decode |
+|-|---------|--------|
+| **Aggregated** | ✅ | ✅ |
+| **Disaggregated** | ⚠️ | ✅ |
+> [!WARNING]
+> ⚠️ SGLang backend currently does not support cancellation during remote prefill phase in disaggregated mode.
+For more details, see the [Request Cancellation Architecture](../../architecture/request_cancellation.md) documentation.
 ## Installation
 ### Install latest release

--- a/docs/backends/trtllm/README.md
+++ b/docs/backends/trtllm/README.md
@@ -228,6 +228,20 @@ python3 -m dynamo.trtllm ... --migration-limit=3
 This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../../../docs/architecture/request_migration.md) documentation for details on how this works.
+## Request Cancellation
+When a user cancels a request (e.g., by disconnecting from the frontend), the request is automatically cancelled across all workers, freeing compute resources for other requests.
+### Cancellation Support Matrix
+| | Prefill | Decode |
+|-|---------|--------|
+| **Aggregated** | ✅ | ✅ |
+| **Disaggregated (Decode-First)** | ✅ | ✅ |
+| **Disaggregated (Prefill-First)** | ✅ | ✅ |
+For more details, see the [Request Cancellation Architecture](../../../docs/architecture/request_cancellation.md) documentation.
 ## Client
 See [client](../../../docs/backends/sglang/README.md#testing-the-deployment) section to learn how to send request to the deployment.

--- a/docs/backends/vllm/README.md
+++ b/docs/backends/vllm/README.md
@@ -189,3 +189,16 @@ python3 -m dynamo.vllm ... --migration-limit=3
 ```
 This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../../../docs/architecture/request_migration.md) documentation for details on how this works.
+## Request Cancellation
+When a user cancels a request (e.g., by disconnecting from the frontend), the request is automatically cancelled across all workers, freeing compute resources for other requests.
+### Cancellation Support Matrix
+| | Prefill | Decode |
+|-|---------|--------|
+| **Aggregated** | ✅ | ✅ |
+| **Disaggregated** | ✅ | ✅ |
+For more details, see the [Request Cancellation Architecture](../../../docs/architecture/request_cancellation.md) documentation.