docs: add feature compatibility matrix and update support matrix for v0.8.0 (#5349)

Signed-off-by: Dan Gil <dagil@nvidia.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com>

docs: add feature compatibility matrix and update support matrix for v0.8.0 (#5349)
Signed-off-by: Dan Gil <dagil@nvidia.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com>
4ed8584a · dagil-nvidia · GitHub · dad42f42 · 4ed8584a · 4ed8584a
Unverified Commit 4ed8584a authored Jan 13, 2026 by dagil-nvidia Committed by GitHub Jan 13, 2026
Show whitespace changes
Inline Side-by-side

Showing with 158 additions and 16 deletions

README.md README.md +10 -6

docs/reference/support-matrix.md docs/reference/support-matrix.md +15 -10

feature-matrix.md feature-matrix.md +133 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -31,11 +31,15 @@ High-throughput, low-latency inference framework designed for serving generative
 ## Framework Support Matrix

 | Feature                                                              | [vLLM](docs/backends/vllm/README.md) | [SGLang](docs/backends/sglang/README.md) | [TensorRT-LLM](docs/backends/trtllm/README.md) |
-| ------------------------------------------------------------------------------------------------- | ---- | ------ | ------------ |
-| [**Disaggregated Serving**](/docs/design_docs/disagg_serving.md)                                 | ✅   | ✅     | ✅           |
-| [**KV-Aware Routing**](/docs/router/kv_cache_routing.md)                                    | ✅   | ✅     | ✅           |
+| -------------------------------------------------------------------- | :--: | :----: | :----------: |
+| [**Disaggregated Serving**](docs/design_docs/disagg_serving.md)      | ✅   | ✅     | ✅           |
+| [**KV-Aware Routing**](docs/router/kv_cache_routing.md)              | ✅   | ✅     | ✅           |
 | [**SLA-Based Planner**](docs/planner/sla_planner.md)                 | ✅   | ✅     | ✅           |
 | [**KVBM**](docs/kvbm/kvbm_architecture.md)                           | ✅   | 🚧     | ✅           |
+| [**Multimodal**](docs/multimodal/index.md)                           | ✅   | ✅     | ✅           |
+| [**Tool Calling**](docs/agents/tool-calling.md)                      | ✅   | ✅     | ✅           |
+
+> **[Full Feature Matrix →](feature-matrix.md)** — Detailed compatibility including LoRA, Request Migration, Speculative Decoding, and feature interactions.

 ## Latest News


--- a/docs/reference/support-matrix.md
+++ b/docs/reference/support-matrix.md
@@ -8,6 +8,8 @@ SPDX-License-Identifier: Apache-2.0

 This document provides the support matrix for Dynamo, including hardware, software and build instructions.

+> **See also:** [Feature Compatibility Matrix](../../feature-matrix.md) for backend-specific feature support (vLLM, TensorRT-LLM, SGLang).
+
 ## Hardware Compatibility

 | **CPU Architecture** | **Status**   |
@@ -60,15 +62,15 @@ If you are using a **GPU**, the following GPU models and architectures are suppo

 The following table shows the dependency versions included with each Dynamo release:

-| **Dependency** | **main (ToT)** | **v0.8.0 (unreleased)** | **v0.7.1** | **v0.7.0.post1** | **v0.7.0** |
-| :------------- | :------------- | :---------------------- | :--------- | :--------------- | :--------- |
-| SGLang         | 0.5.7          | 0.5.7                   | 0.5.3.post4| 0.5.3.post4      | 0.5.3.post4|
-| TensorRT-LLM   | 1.2.0rc6.post1 | 1.2.0rc6                | 1.2.0rc3   | 1.2.0rc3         | 1.2.0rc2   |
+| **Dependency** | **main (ToT)** | **v0.8.0** | **v0.7.1** | **v0.7.0.post1** | **v0.7.0** |
+| :------------- | :------------- | :--------- | :--------- | :--------------- | :--------- |
+| SGLang         | 0.5.7          | 0.5.6.post2 | 0.5.3.post4| 0.5.3.post4      | 0.5.3.post4|
+| TensorRT-LLM   | 1.2.0rc6.post1 | 1.2.0rc6.post1 | 1.2.0rc3   | 1.2.0rc3         | 1.2.0rc2   |
 | vLLM           | 0.13.0         | 0.12.0     | 0.11.0     | 0.11.0           | 0.11.0     |
 | NIXL           | 0.8.0          | 0.8.0      | 0.8.0      | 0.8.0            | 0.8.0      |

 > [!Note]
-> **main (ToT)** reflects the current development branch. **v0.8.0** is the upcoming release (planned for January 14, 2025) and not yet available.
+> **main (ToT)** reflects the current development branch.


 > [!Important]
@@ -76,9 +78,12 @@ The following table shows the dependency versions included with each Dynamo rele

 ### CUDA Support by Framework
 | **Dynamo Version**   | **SGLang**                        | **TensorRT-LLM**        | **vLLM**                          |
-| :------------------- | :-----------------------| :-----------------------| :-----------------------|
+| :------------------- | :-------------------------------- | :-----------------------| :-------------------------------- |
+| **Dynamo 0.8.0**     | CUDA 12.9, CUDA 13.0 (🧪)         | CUDA 13.0               | CUDA 12.9, CUDA 13.0 (🧪)         |
 | **Dynamo 0.7.1**     | CUDA 12.8                         | CUDA 13.0               | CUDA 12.9                         |

+> 🧪 = Experimental
+
 ## Cloud Service Provider Compatibility

 ### AWS

--- a/feature-matrix.md
+++ b/feature-matrix.md
+# Dynamo Feature Compatibility Matrices
+
+This document provides a comprehensive compatibility matrix for key Dynamo features across the supported backends.
+
+*Updated for Dynamo v0.8.0*
+
+**Legend:**
+*   ✅ : Fully Supported / Compatible
+*   ❌ : Not Supported / Incompatible
+*   🚧 : Work in Progress
+*   ⚠️ : Limited Support (see notes)
+*   🧪 : Experimental
+
+## Quick Comparison
+
+| Feature | vLLM | TensorRT-LLM | SGLang | Source |
+| :--- | :---: | :---: | :---: | :--- |
+| **Disaggregated Serving** | ✅ | ✅ | ✅ | [Design Doc][disagg] |
+| **KV-Aware Routing** | ✅ | ✅ | ✅ | [Router Doc][kv-routing] |
+| **SLA-Based Planner** | ✅ | ✅ | ✅ | [Planner Doc][planner] |
+| **KV Block Manager** | ✅ | ✅ | 🚧 | [KVBM Doc][kvbm] |
+| **Multimodal (Image)** | ✅ | ✅ | ✅ | [Multimodal Doc][mm] |
+| **Multimodal (Video)** | ✅ | ❌ | ❌ | [Multimodal Doc][mm] |
+| **Multimodal (Audio)** | 🧪 | ❌ | ❌ | [Multimodal Doc][mm] |
+| **Request Migration** | ✅ | ⚠️ | ✅ | [Migration Doc][migration] |
+| **Request Cancellation** | ✅ | ✅ | ⚠️ | Backend READMEs |
+| **LoRA** | ✅ | ❌ | ❌ | [K8s Guide][lora] |
+| **Tool Calling** | ✅ | ✅ | ✅ | [Tool Calling Doc][tools] |
+| **Speculative Decoding** | ✅ | ✅ | 🚧 | Backend READMEs |
+
+## 1. vLLM Backend
+
+vLLM offers the broadest feature coverage in Dynamo, with full support for disaggregated serving, KV-aware routing, KV block management, LoRA adapters, and multimodal inference including video and audio.
+
+*Source: [docs/backends/vllm/README.md][vllm-readme]*
+
+| Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| **Disaggregated Serving** | — | | | | | | | | | |
+| **KV-Aware Routing** | ✅ | — | | | | | | | | |
+| **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | |
+| **KV Block Manager** | ✅ | ✅ | ✅ | — | | | | | | |
+| **Multimodal** | ✅ | ❌<sup>1</sup> | — | ✅ | — | | | | | |
+| **Request Migration** | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | | |
+| **Request Cancellation** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | |
+| **LoRA** | ✅ | ✅<sup>2</sup> | — | ✅ | — | ✅ | ✅ | — | | |
+| **Tool Calling** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | |
+| **Speculative Decoding** | ✅ | ✅ | — | ✅ | — | ✅ | ✅ | — | ✅ | — |
+
+> **Notes:**
+> 1. **Multimodal + KV-Aware Routing**: The KV router uses token-based hashing and does not yet support image/video hashes, so it falls back to random/round-robin routing. ([Source][kv-routing])
+> 2. **KV-Aware LoRA Routing**: vLLM supports routing requests based on LoRA adapter affinity.
+> 3. **Audio Support**: vLLM supports audio models like Qwen2-Audio (experimental). ([Source][mm-vllm])
+> 4. **Video Support**: vLLM supports video input with frame sampling. ([Source][mm-vllm])
+> 5. **Speculative Decoding**: Eagle3 support documented. ([Source][vllm-spec])
+
+## 2. TensorRT-LLM Backend
+
+TensorRT-LLM delivers maximum inference performance and optimization, with full KVBM integration and robust disaggregated serving support.
+
+*Source: [docs/backends/trtllm/README.md][trtllm-readme]*
+
+| Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| **Disaggregated Serving** | — | | | | | | | | | |
+| **KV-Aware Routing** | ✅ | — | | | | | | | | |
+| **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | |
+| **KV Block Manager** | ✅ | ✅ | ✅ | — | | | | | | |
+| **Multimodal** | ✅<sup>1</sup> | ❌<sup>2</sup> | — | ✅ | — | | | | | |
+| **Request Migration** | ⚠️<sup>3</sup> | ✅ | ✅ | ✅ | ⚠️ | — | | | | |
+| **Request Cancellation** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | |
+| **LoRA** | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | — | | |
+| **Tool Calling** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | — | |
+| **Speculative Decoding** | ✅ | ✅ | — | ✅ | — | ✅ | ✅ | ❌ | ✅ | — |
+
+> **Notes:**
+> 1. **Multimodal Disaggregation**: Fully supports **EP/D** (Traditional) pattern. **E/P/D** (Full Disaggregation) is WIP and currently supports pre-computed embeddings only. ([Source][mm-trtllm])
+> 2. **Multimodal + KV-Aware Routing**: Not supported. The KV router currently tracks token-based blocks only. ([Source][kv-routing])
+> 3. **Request Migration**: Supported on **Decode/Aggregated** workers only. **Prefill** workers do not support migration. ([Source][trtllm-readme])
+> 4. **Speculative Decoding**: Llama 4 + Eagle support documented. ([Source][trtllm-eagle])
+
+## 3. SGLang Backend
+
+SGLang is optimized for high-throughput serving with fast primitives, providing robust support for disaggregated serving, KV-aware routing, and request migration.
+
+*Source: [docs/backends/sglang/README.md][sglang-readme]*
+
+| Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| **Disaggregated Serving** | — | | | | | | | | | |
+| **KV-Aware Routing** | ✅ | — | | | | | | | | |
+| **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | |
+| **KV Block Manager** | 🚧 | 🚧 | 🚧 | — | | | | | | |
+| **Multimodal** | ✅<sup>2</sup> | ❌<sup>1</sup> | — | 🚧 | — | | | | | |
+| **Request Migration** | ✅ | ✅ | ✅ | 🚧 | ✅ | — | | | | |
+| **Request Cancellation** | ⚠️<sup>3</sup> | ✅ | ✅ | 🚧 | ⚠️ | ✅ | — | | | |
+| **LoRA** | ❌ | ❌ | ❌ | 🚧 | ❌ | ❌ | ❌ | — | | |
+| **Tool Calling** | ✅ | ✅ | ✅ | 🚧 | ✅ | ✅ | ✅ | ❌ | — | |
+| **Speculative Decoding** | 🚧 | 🚧 | — | 🚧 | — | 🚧 | — | ❌ | 🚧 | — |
+
+> **Notes:**
+> 1. **Multimodal + KV-Aware Routing**: Not supported. ([Source][kv-routing])
+> 2. **Multimodal Patterns**: Supports **E/PD** and **E/P/D** only (requires separate vision encoder). Does **not** support simple Aggregated (EPD) or Traditional Disagg (EP/D). ([Source][mm-sglang])
+> 3. **Request Cancellation**: Cancellation during the remote prefill phase is not supported in disaggregated mode. ([Source][sglang-readme])
+> 4. **Speculative Decoding**: Code hooks exist (`spec_decode_stats` in publisher), but no examples or documentation yet.
+
+---
+
+## Source References
+
+<!-- Backend READMEs -->
+[vllm-readme]: docs/backends/vllm/README.md
+[trtllm-readme]: docs/backends/trtllm/README.md
+[sglang-readme]: docs/backends/sglang/README.md
+
+<!-- Design Docs -->
+[disagg]: docs/design_docs/disagg_serving.md
+[kv-routing]: docs/router/kv_cache_routing.md
+[planner]: docs/planner/planner_intro.rst
+[kvbm]: docs/kvbm/kvbm_intro.rst
+[migration]: docs/fault_tolerance/request_migration.md
+[tools]: docs/agents/tool-calling.md
+
+<!-- Multimodal -->
+[mm]: docs/multimodal/index.md
+[mm-vllm]: docs/multimodal/vllm.md
+[mm-trtllm]: docs/multimodal/trtllm.md
+[mm-sglang]: docs/multimodal/sglang.md
+
+<!-- Feature-specific -->
+[lora]: docs/kubernetes/deployment/dynamomodel-guide.md
+[vllm-spec]: docs/backends/vllm/speculative_decoding.md
+[trtllm-eagle]: docs/backends/trtllm/llama4_plus_eagle.md