docs: alphabetize backends (SGLang, TensorRT-LLM, vLLM) (#6537)

Signed-off-by: Dan Gil <dagil@nvidia.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

docs: alphabetize backends (SGLang, TensorRT-LLM, vLLM) (#6537)
Signed-off-by: Dan Gil <dagil@nvidia.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
e0373bd7 · dagil-nvidia · GitHub · 80955ef4 · e0373bd7 · e0373bd7
Unverified Commit e0373bd7 authored Feb 25, 2026 by dagil-nvidia Committed by GitHub Feb 25, 2026
10 changed files
--- a/docs/pages/integrations/flexkv-integration.md
+++ b/docs/pages/integrations/flexkv-integration.md
@@ -8,7 +8,7 @@ title: FlexKV

 ## Introduction

-[FlexKV](https://github.com/taco-project/FlexKV) is a scalable, distributed runtime for KV cache offloading developed by Tencent Cloud's TACO team in collaboration with the community. It acts as a unified KV caching layer for inference engines like vLLM, TensorRT-LLM, and SGLang.
+[FlexKV](https://github.com/taco-project/FlexKV) is a scalable, distributed runtime for KV cache offloading developed by Tencent Cloud's TACO team in collaboration with the community. It acts as a unified KV caching layer for inference engines like SGLang, TensorRT-LLM, and vLLM.

 ### Key Features


--- a/docs/pages/integrations/kv-events-custom-engines.md
+++ b/docs/pages/integrations/kv-events-custom-engines.md
@@ -17,7 +17,7 @@ Events are published over the **Dynamo event plane**, a transport-agnostic pub/s
 `KvEventPublisher` supports two publishing modes:

 1. **Direct publishing** — Your engine calls `publish_stored()` / `publish_removed()` to push events directly over the event plane. Simplest approach for custom engines.
-2. **ZMQ relay** — For engines that emit raw KV events over a ZMQ socket (like vLLM and SGLang). The publisher subscribes to the ZMQ endpoint and relays events to the event plane automatically.
+2. **ZMQ relay** — For engines that emit raw KV events over a ZMQ socket (like SGLang and vLLM). The publisher subscribes to the ZMQ endpoint and relays events to the event plane automatically.

 ## Event Types

@@ -137,11 +137,11 @@ async def main():

 ## ZMQ Relay (For Engines with Raw KV Events)

-For engines that already publish raw KV events over a ZMQ socket (like vLLM and SGLang), use the same `KvEventPublisher` with a `zmq_endpoint`. The publisher subscribes to the ZMQ socket and relays events to the event plane automatically.
+For engines that already publish raw KV events over a ZMQ socket (like SGLang and vLLM), use the same `KvEventPublisher` with a `zmq_endpoint`. The publisher subscribes to the ZMQ socket and relays events to the event plane automatically.

 ```mermaid
 flowchart LR
-    subgraph Engine["Custom Engine / vLLM / SGLang"]
+    subgraph Engine["Custom Engine / SGLang / vLLM"]
        cache["KV Cache Manager"]
        zmq_pub["ZMQ Publisher"]
    end
@@ -170,7 +170,7 @@ flowchart LR
 ```

 **When to use:**
- Your engine already publishes KV events via ZMQ (like vLLM or SGLang)
+- Your engine already publishes KV events via ZMQ (like SGLang or vLLM)
 - You want to decouple event publishing from your engine's main loop

 ### Setup
@@ -192,7 +192,7 @@ No further calls to `publish_stored()` / `publish_removed()` are needed — the

 ### ZMQ Wire Format

-The ZMQ message format (compatible with vLLM / SGLang):
+The ZMQ message format (compatible with SGLang / vLLM):

 | Frame | Description |
 |-------|-------------|

--- a/docs/pages/kubernetes/chrek/dynamo.md
+++ b/docs/pages/kubernetes/chrek/dynamo.md
@@ -212,7 +212,7 @@ Checkpoints are uniquely identified by a **16-character SHA256 hash** (64 bits)
 | Field | Required | Affects Hash | Example |
 |-------|----------|-------------|---------|
 | `model` | ✓ | ✓ | `meta-llama/Llama-3-8B` |
-| `framework` | ✓ | ✓ | `vllm`, `sglang`, `trtllm` |
+| `framework` | ✓ | ✓ | `sglang`, `trtllm`, `vllm` |
 | `dynamoVersion` | | ✓ | `0.9.0`, `1.0.0` |
 | `tensorParallelSize` | | ✓ | `1`, `2`, `4`, `8` (default: 1) |
 | `pipelineParallelSize` | | ✓ | `1`, `2` (default: 1) |

--- a/docs/pages/kubernetes/deployment/dynamomodel-guide.md
+++ b/docs/pages/kubernetes/deployment/dynamomodel-guide.md
@@ -437,7 +437,7 @@ status:
   - For HuggingFace: Verify token is valid, repo exists and is accessible

 2. **Invalid LoRA format**
-   **Solution:** Ensure your LoRA weights are in the format expected by your backend framework (vLLM, SGLang, etc.)
+   **Solution:** Ensure your LoRA weights are in the format expected by your backend framework (SGLang, vLLM, etc.)

 3. **Endpoint API errors**
   ```bash

--- a/docs/pages/observability/metrics.md
+++ b/docs/pages/observability/metrics.md
@@ -194,7 +194,7 @@ curl -s localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
 **Timeline:**
 ```
 Timeline:    0, 1, ...
-Client ────> Frontend:8000 ────────────────────> Dynamo component/backend (vLLM, SGLang, TRT)
+Client ────> Frontend:8000 ────────────────────> Dynamo component/backend (SGLang, TRT, vLLM)
             │request start                     │received                              │
             |                                  |                                      |
             │                                  ├──> start prefill ──> first token ──> |last token

--- a/docs/pages/reference/feature-matrix.md
+++ b/docs/pages/reference/feature-matrix.md
@@ -16,20 +16,20 @@ This document provides a comprehensive compatibility matrix for key Dynamo featu

 ## Quick Comparison

-| Feature | vLLM | TensorRT-LLM | SGLang | Source |
+| Feature | SGLang | TensorRT-LLM | vLLM | Source |
 | :--- | :---: | :---: | :---: | :--- |
 | **Disaggregated Serving** | ✅ | ✅ | ✅ | [Design Doc][disagg] |
 | **KV-Aware Routing** | ✅ | ✅ | ✅ | [Router Doc][kv-routing] |
 | **SLA-Based Planner** | ✅ | ✅ | ✅ | [Planner Doc][planner] |
-| **KV Block Manager** | ✅ | ✅ | 🚧 | [KVBM Doc][kvbm] |
+| **KV Block Manager** | 🚧 | ✅ | ✅ | [KVBM Doc][kvbm] |
 | **Multimodal (Image)** | ✅ | ✅ | ✅ | [Multimodal Doc][mm] |
-| **Multimodal (Video)** | ✅ | | | [Multimodal Doc][mm] |
-| **Multimodal (Audio)** | 🚧 | | | [Multimodal Doc][mm] |
+| **Multimodal (Video)** | | | ✅ | [Multimodal Doc][mm] |
+| **Multimodal (Audio)** | | | 🚧 | [Multimodal Doc][mm] |
 | **Request Migration** | ✅ | 🚧 | ✅ | [Migration Doc][migration] |
-| **Request Cancellation** | ✅ | ✅ | 🚧 | Backend READMEs |
-| **LoRA** | ✅ | | | [K8s Guide][lora] |
+| **Request Cancellation** | 🚧 | ✅ | ✅ | Backend READMEs |
+| **LoRA** | | | ✅ | [K8s Guide][lora] |
 | **Tool Calling** | ✅ | ✅ | ✅ | [Tool Calling Doc][tools] |
-| **Speculative Decoding** | ✅ | ✅ | 🚧 | Backend READMEs |
+| **Speculative Decoding** | 🚧 | ✅ | ✅ | Backend READMEs |

 ## 1. vLLM Backend


--- a/docs/pages/reference/release-artifacts.md
+++ b/docs/pages/reference/release-artifacts.md
@@ -48,7 +48,7 @@ We recommend using the TensorRT-LLM NGC container instead of the `ai-dynamo[trtl

 | Package | Description | Python | Platform | PyPI |
 |---------|-------------|--------|----------|------|
-| `ai-dynamo==0.8.1` | Main package with backend integrations (vLLM, SGLang, TRT-LLM) | `3.10`–`3.12` | Linux (glibc `v2.28+`) | [link](https://pypi.org/project/ai-dynamo/0.8.1/) |
+| `ai-dynamo==0.8.1` | Main package with backend integrations (SGLang, TRT-LLM, vLLM) | `3.10`–`3.12` | Linux (glibc `v2.28+`) | [link](https://pypi.org/project/ai-dynamo/0.8.1/) |
 | `ai-dynamo-runtime==0.8.1` | Core Python bindings for Dynamo runtime | `3.10`–`3.12` | Linux (glibc `v2.28+`) | [link](https://pypi.org/project/ai-dynamo-runtime/0.8.1/) |
 | `kvbm==0.8.1` | KV Block Manager for disaggregated KV cache | `3.12` | Linux (glibc `v2.28+`) | [link](https://pypi.org/project/kvbm/0.8.1/) |

@@ -75,7 +75,7 @@ We recommend using the TensorRT-LLM NGC container instead of the `ai-dynamo[trtl

 ### Container Images (NGC)

-> For detailed run instructions, see the [Container README](https://github.com/ai-dynamo/dynamo/tree/main/container/README.md) or backend-specific guides: [vLLM](../backends/vllm/README.md) | [SGLang](../backends/sglang/README.md) | [TensorRT-LLM](../backends/trtllm/README.md)
+> For detailed run instructions, see the [Container README](https://github.com/ai-dynamo/dynamo/tree/main/container/README.md) or backend-specific guides: [SGLang](../backends/sglang/README.md) | [TensorRT-LLM](../backends/trtllm/README.md) | [vLLM](../backends/vllm/README.md)

 ```bash
 # Runtime containers
@@ -158,7 +158,7 @@ For a complete list of known issues, refer to the release notes for each patch:

 - **v0.8.1.post1 Patch**: Updated TRT-LLM to `v1.2.0rc6.post2` (PyPI wheels and TRT-LLM container only)
 - **Standalone Frontend Container**: `dynamo-frontend` added in v0.8.0
- **CUDA 13 Runtimes**: Experimental CUDA 13 runtime for vLLM and SGLang in v0.8.0
+- **CUDA 13 Runtimes**: Experimental CUDA 13 runtime for SGLang and vLLM in v0.8.0
 - **New Rust Crates**: `dynamo-memory` and `dynamo-config` added in v0.8.0

 ### GitHub Releases

--- a/docs/pages/reference/support-matrix.md
+++ b/docs/pages/reference/support-matrix.md
@@ -14,7 +14,7 @@ This document provides the support matrix for Dynamo, including hardware, softwa

 The following table shows the backend framework versions included with each Dynamo release:

-| **Dynamo** | **vLLM** | **SGLang** | **TensorRT-LLM** | **NIXL** |
+| **Dynamo** | **SGLang** | **TensorRT-LLM** | **vLLM** | **NIXL** |
 | :--- | :--- | :--- | :--- | :--- |
 | **main (ToT)** | `0.15.1` | `0.5.9` | `1.3.0rc3` | `0.9.0` |
 | **v1.0.0** *(planned)* | `0.15.0` | *Latest as of 2/17* | *Latest as of 2/17* | `0.10.0` |
@@ -44,14 +44,14 @@ The following table shows the backend framework versions included with each Dyna

 ### CUDA Versions by Backend

-| **Dynamo** | **vLLM** | **SGLang** | **TensorRT-LLM** | **Notes** |
+| **Dynamo** | **SGLang** | **TensorRT-LLM** | **vLLM** | **Notes** |
 | :--- | :--- | :--- | :--- | :--- |
-| **v0.8.1** | `12.9`, `13.0` | `12.9`, `13.0` | `13.0` | Experimental vLLM/SGLang CUDA 13 support |
-| **v0.8.0** | `12.9`, `13.0` | `12.9`, `13.0` | `13.0` | Experimental vLLM/SGLang CUDA 13 support |
-| **v0.7.1** | `12.9` | `12.8` | `13.0` | |
-| **v0.7.0** | `12.8` | `12.9` | `13.0` | TensorRT-LLM CUDA 13 support - CUDA 12.9 deprecated |
-| **v0.6.1** | `12.8` | `12.9` | `12.9` | |
-| **v0.6.0** | `12.8` | `12.8` | `12.9` | |
+| **v0.8.1** | `12.9`, `13.0` | `13.0` | `12.9`, `13.0` | Experimental SGLang/vLLM CUDA 13 support |
+| **v0.8.0** | `12.9`, `13.0` | `13.0` | `12.9`, `13.0` | Experimental SGLang/vLLM CUDA 13 support |
+| **v0.7.1** | `12.8` | `13.0` | `12.9` | |
+| **v0.7.0** | `12.9` | `13.0` | `12.8` | TensorRT-LLM CUDA 13 support - CUDA 12.9 deprecated |
+| **v0.6.1** | `12.9` | `12.9` | `12.8` | |
+| **v0.6.0** | `12.8` | `12.9` | `12.8` | |

 Patch versions (e.g., v0.8.1.post1, v0.7.0.post1) have the same CUDA support as their base version.

@@ -101,22 +101,22 @@ Dynamo container images include CUDA toolkit libraries. The host machine must ha

 | Dynamo Version | Backend | CUDA Toolkit | Min Driver (Linux) | Min Driver (Windows) | Notes |
 | :--- | :--- | :--- | :--- | :--- | :--- |
-| **0.8.1** | **vLLM** | 12.9 | 575.xx+ | 576.xx+ | |
-| | | 13.0 | 580.xx+ | 581.xx+ | Experimental |
-| | **SGLang** | 12.9 | 575.xx+ | 576.xx+ | |
+| **0.8.1** | **SGLang** | 12.9 | 575.xx+ | 576.xx+ | |
 | | | 13.0 | 580.xx+ | 581.xx+ | Experimental |
 | | **TensorRT-LLM** | 13.0 | 580.xx+ | 581.xx+ | |
-| **0.8.0** | **vLLM** | 12.9 | 575.xx+ | 576.xx+ | |
+| | **vLLM** | 12.9 | 575.xx+ | 576.xx+ | |
 | | | 13.0 | 580.xx+ | 581.xx+ | Experimental |
-| | **SGLang** | 12.9 | 575.xx+ | 576.xx+ | |
+| **0.8.0** | **SGLang** | 12.9 | 575.xx+ | 576.xx+ | |
 | | | 13.0 | 580.xx+ | 581.xx+ | Experimental |
 | | **TensorRT-LLM** | 13.0 | 580.xx+ | 581.xx+ | |
-| **0.7.1** | **vLLM** | 12.9 | 575.xx+ | 576.xx+ | |
-| | **SGLang** | 12.8 | 570.xx+ | 571.xx+ | |
+| | **vLLM** | 12.9 | 575.xx+ | 576.xx+ | |
+| | | 13.0 | 580.xx+ | 581.xx+ | Experimental |
+| **0.7.1** | **SGLang** | 12.8 | 570.xx+ | 571.xx+ | |
 | | **TensorRT-LLM** | 13.0 | 580.xx+ | 581.xx+ | |
-| **0.7.0** | **vLLM** | 12.8 | 570.xx+ | 571.xx+ | |
-| | **SGLang** | 12.9 | 575.xx+ | 576.xx+ | |
+| | **vLLM** | 12.9 | 575.xx+ | 576.xx+ | |
+| **0.7.0** | **SGLang** | 12.9 | 575.xx+ | 576.xx+ | |
 | | **TensorRT-LLM** | 13.0 | 580.xx+ | 581.xx+ | |
+| | **vLLM** | 12.8 | 570.xx+ | 571.xx+ | |

 Experimental CUDA 13 images are not published for all versions. Check [Release Artifacts](release-artifacts.md) for availability.


--- a/docs/pages/templates/README.md
+++ b/docs/pages/templates/README.md
@@ -30,7 +30,7 @@ Templates for creating consistent Dynamo documentation.
 └──────────────────────────────────────────────────────────────┘
 ```

-### Backends (vLLM, SGLang, TRT-LLM)
+### Backends (SGLang, TRT-LLM, vLLM)

 ```
 ┌─────────────────────────────────────────────────────┐

--- a/lib/kv-router/README.md
+++ b/lib/kv-router/README.md
@@ -38,7 +38,7 @@ block2 in B: seq_hash = hash(hash(hash(block0') || block1') || block2) = 0x2222

 > **Important: Engine-Provided Hashes**
 >
-> In practice, the `ExternalSequenceBlockHash` may come directly from the inference engine (e.g., vLLM, TensorRT-LLM) using a rolling hash algorithm that we don't know or control. The engine computes these hashes internally and reports them via KV cache events.
+> In practice, the `ExternalSequenceBlockHash` may come directly from the inference engine (e.g., TensorRT-LLM, vLLM) using a rolling hash algorithm that we don't know or control. The engine computes these hashes internally and reports them via KV cache events.
 >
 > **LoRA identity**: The engine is responsible for incorporating the LoRA adapter identity into the `ExternalSequenceBlockHash` before emitting KV events. Dynamo does not add LoRA information at the router layer. For example, vLLM does this via `_gen_lora_extra_hash_keys`, which appends the LoRA ID as extra keys when calling `hash_block_tokens(..., extra_keys)`. Any engine integrating with the KV router must follow the same convention to ensure correct cache isolation between LoRA adapters.
 >