feature-matrix.md 7.65 KB
Newer Older
1
2
3
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
4
title: Feature Matrix
5
6
7
8
9
10
11
12
13
14
15
16
---

This document provides a comprehensive compatibility matrix for key Dynamo features across the supported backends.

*Updated for Dynamo v0.9.0*

**Legend:**
*   ✅ : Supported
*   🚧 : Work in Progress / Experimental / Limited

## Quick Comparison

17
| Feature | SGLang | TensorRT-LLM | vLLM | Source |
18
| :--- | :---: | :---: | :---: | :--- |
19
20
21
| **Disaggregated Serving** | ✅ | ✅ | ✅ | [Design Doc][disagg] |
| **KV-Aware Routing** | ✅ | ✅ | ✅ | [Router Doc][kv-routing] |
| **SLA-Based Planner** | ✅ | ✅ | ✅ | [Planner Doc][planner] |
22
| **KV Block Manager** | 🚧 | ✅ | ✅ | [KVBM Doc][kvbm] |
23
| **Multimodal (Image)** | ✅ | ✅ | ✅ | [Multimodal Doc][mm] |
24
25
| **Multimodal (Video)** | | | ✅ | [Multimodal Doc][mm] |
| **Multimodal (Audio)** | | | 🚧 | [Multimodal Doc][mm] |
26
| **Request Migration** | ✅ | 🚧 | ✅ | [Migration Doc][migration] |
27
28
| **Request Cancellation** | 🚧 | ✅ | ✅ | Backend READMEs |
| **LoRA** | | | ✅ | [K8s Guide][lora] |
29
| **Tool Calling** | ✅ | ✅ | ✅ | [Tool Calling Doc][tools] |
30
| **Speculative Decoding** | 🚧 | ✅ | ✅ | Backend READMEs |
31
| **Dynamo Snapshot** | ✅ | | ✅ | [Snapshot Docs][snapshot] |
32
33
34
35
36

## 1. vLLM Backend

vLLM offers the broadest feature coverage in Dynamo, with full support for disaggregated serving, KV-aware routing, KV block management, LoRA adapters, and multimodal inference including video and audio.

37
*Source: [docs/backends/vllm/README.md][vllm-readme]*
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

| Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Disaggregated Serving** | — | | | | | | | | | |
| **KV-Aware Routing** | ✅ | — | | | | | | | | |
| **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | |
| **KV Block Manager** | ✅ | ✅ | ✅ | — | | | | | | |
| **Multimodal** | ✅ | <sup>1</sup> | — | ✅ | — | | | | | |
| **Request Migration** | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | | |
| **Request Cancellation** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | | | |
| **LoRA** | ✅ | ✅<sup>2</sup> | — | ✅ | — | ✅ | ✅ | — | | |
| **Tool Calling** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | |
| **Speculative Decoding** | ✅ | ✅ | — | ✅ | — | ✅ | ✅ | — | ✅ | — |

> **Notes:**
53
> 1. **Multimodal + KV-Aware Routing**: The KV router uses token-based hashing and does not yet support image/video hashes, so it falls back to random/round-robin routing. ([Source][kv-routing])
54
> 2. **KV-Aware LoRA Routing**: vLLM supports routing requests based on LoRA adapter affinity.
55
56
57
> 3. **Audio Support**: vLLM supports audio models like Qwen2-Audio (experimental). ([Source][mm-vllm])
> 4. **Video Support**: vLLM supports video input with frame sampling. ([Source][mm-vllm])
> 5. **Speculative Decoding**: Eagle3 support documented. ([Source][vllm-spec])
58
59
60
61
62

## 2. SGLang Backend

SGLang is optimized for high-throughput serving with fast primitives, providing robust support for disaggregated serving, KV-aware routing, and request migration.

63
*Source: [docs/backends/sglang/README.md][sglang-readme]*
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

| Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Disaggregated Serving** | — | | | | | | | | | |
| **KV-Aware Routing** | ✅ | — | | | | | | | | |
| **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | |
| **KV Block Manager** | 🚧 | 🚧 | 🚧 | — | | | | | | |
| **Multimodal** | ✅<sup>2</sup> | <sup>1</sup> | — | 🚧 | — | | | | | |
| **Request Migration** | ✅ | ✅ | ✅ | 🚧 | ✅ | — | | | | |
| **Request Cancellation** | 🚧<sup>3</sup> | ✅ | ✅ | 🚧 | 🚧 | ✅ | — | | | |
| **LoRA** | | | | 🚧 | | | | — | | |
| **Tool Calling** | ✅ | ✅ | ✅ | 🚧 | ✅ | ✅ | ✅ | | — | |
| **Speculative Decoding** | 🚧 | 🚧 | — | 🚧 | — | 🚧 | — | | 🚧 | — |

> **Notes:**
79
80
81
> 1. **Multimodal + KV-Aware Routing**: Not supported. ([Source][kv-routing])
> 2. **Multimodal Patterns**: Supports **E/PD** and **E/P/D** only (requires separate vision encoder). Does **not** support simple Aggregated (EPD) or Traditional Disagg (EP/D). ([Source][mm-sglang])
> 3. **Request Cancellation**: Cancellation during the remote prefill phase is not supported in disaggregated mode. ([Source][sglang-readme])
82
83
84
85
86
87
> 4. **Speculative Decoding**: Code hooks exist (`spec_decode_stats` in publisher), but no examples or documentation yet.

## 3. TensorRT-LLM Backend

TensorRT-LLM delivers maximum inference performance and optimization, with full KVBM integration and robust disaggregated serving support.

88
*Source: [docs/backends/trtllm/README.md][trtllm-readme]*
89
90
91
92
93
94
95
96

| Feature | Disaggregated Serving | KV-Aware Routing | SLA-Based Planner | KV Block Manager | Multimodal | Request Migration | Request Cancellation | LoRA | Tool Calling | Speculative Decoding |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Disaggregated Serving** | — | | | | | | | | | |
| **KV-Aware Routing** | ✅ | — | | | | | | | | |
| **SLA-Based Planner** | ✅ | ✅ | — | | | | | | | |
| **KV Block Manager** | ✅ | ✅ | ✅ | — | | | | | | |
| **Multimodal** | ✅<sup>1</sup> | <sup>2</sup> | — | ✅ | — | | | | | |
97
98
| **Request Migration** | ✅ | ✅ | ✅ | ✅ | 🚧 | — | | | | |
| **Request Cancellation** | ✅<sup>3</sup> | ✅<sup>3</sup> | ✅<sup>3</sup> | ✅<sup>3</sup> | ✅<sup>3</sup> | ✅<sup>3</sup> | — | | | |
99
100
101
102
103
| **LoRA** | | | | | | | | — | | |
| **Tool Calling** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | — | |
| **Speculative Decoding** | ✅ | ✅ | — | ✅ | — | ✅ | ✅ | | ✅ | — |

> **Notes:**
104
105
106
107
108
109
110
> 1. **Multimodal Disaggregation**: Fully supports **EP/D** (Traditional) pattern. **E/P/D** (Full Disaggregation) is WIP and currently supports pre-computed embeddings only. ([Source][mm-trtllm])
> 2. **Multimodal + KV-Aware Routing**: Not supported. The KV router currently tracks token-based blocks only. ([Source][kv-routing])
> 3. **Request Cancellation**: Due to known issues, the TensorRT-LLM engine is temporarily not notified of request cancellations, meaning allocated resources for cancelled requests are not freed.

---


111
112
113
114
{/* Backend READMEs — paths relative to rendered URL /getting-started/feature-matrix */}
[vllm-readme]: ../backends/v-llm
[sglang-readme]: ../backends/sg-lang
[trtllm-readme]: ../backends/tensor-rt-llm
115
116

{/* Design Docs */}
117
[disagg]: ../design-docs/disaggregated-serving
118
119
120
[kv-routing]: ../components/router/router-guide
[planner]: ../components/planner
[kvbm]: ../components/kvbm
121
122
[migration]: ../user-guides/fault-tolerance/request-migration
[tools]: ../user-guides/tool-calling
123
124

{/* Multimodal */}
125
126
127
128
[mm]: ../user-guides/multimodality-support
[mm-vllm]: ../user-guides/multimodality-support/v-llm-multimodal
[mm-trtllm]: ../user-guides/multimodality-support/tensor-rt-llm-multimodal
[mm-sglang]: ../user-guides/multimodality-support/sg-lang-multimodal
129
130

{/* Feature-specific */}
131
132
133
[lora]: ../kubernetes-deployment/deployment-guide/managing-models-with-dynamo-model
[vllm-spec]: ../additional-resources/speculative-decoding/speculative-decoding-with-v-llm
[trtllm-eagle]: ../additional-resources/tensor-rt-llm-details/llama-4-eagle
134
135
136

{/* Dynamo Snapshot */}
[snapshot]: ../kubernetes/snapshot/README.md