.. SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0 Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Welcome to NVIDIA Dynamo ======================== The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale. .. admonition:: 💎 Discover the latest developments! :class: seealso This guide is a snapshot of the `Dynamo GitHub Repository `_ at a specific point in time. For the latest information and examples, see: - `Dynamo README `_ - `Architecture and features doc `_ - `Usage guides `_ - `Dynamo examples repo `_ Quick Start ----------------- Follow the :doc:`Quick Guide to install Dynamo Platform `. Dive in: Examples ----------------- The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch. .. grid:: 1 2 2 2 :gutter: 3 :margin: 0 :padding: 3 4 0 0 .. grid-item-card:: :doc:`Hello World ` :link: examples/runtime/hello_world/README :link-type: doc Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph .. grid-item-card:: :doc:`LLM Serving with VLLM ` :link: components/backends/vllm/README :link-type: doc Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM. .. grid-item-card:: :doc:`Multinode with SGLang ` :link: components/backends/sglang/docs/multinode-examples :link-type: doc Demonstrates disaggregated serving on several nodes. .. grid-item-card:: :doc:`TensorRT-LLM ` :link: components/backends/trtllm/README :link-type: doc Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations. .. toctree:: :hidden: Welcome to Dynamo Support Matrix .. toctree:: :hidden: :caption: Architecture & Features High Level Architecture Distributed Runtime Disaggregated Serving KV Block Manager KV Cache Routing Planner Dynamo Architecture Flow .. toctree:: :hidden: :caption: Using Dynamo Running Inference Graphs Locally (dynamo-run) Deploying Inference Graphs .. toctree:: :hidden: :caption: Usage Guides Writing Python Workers in Dynamo Disaggregation and Performance Tuning KV Cache Router Performance Tuning Working with Dynamo Kubernetes Operator .. toctree:: :hidden: :caption: Deployment Guides Dynamo Deploy Quickstart Dynamo Cloud Kubernetes Platform Manual Helm Deployment GKE Setup Guide Minikube Setup Guide Model Caching with Fluid .. toctree:: :hidden: :caption: Benchmarking Planner Benchmark Example .. toctree:: :hidden: :caption: API NIXL Connect API .. toctree:: :hidden: :caption: Examples Hello World LLM Deployment Examples using VLLM Multinode Examples using SGLang LLM Deployment Examples using TensorRT-LLM .. toctree:: :hidden: :caption: Reference Glossary KVBM Reading