**Decode Phase** - The second phase of LLM inference that generates output tokens one at a time.
**Decode Phase** - The second phase of LLM inference that generates output tokens one at a time.
**depends()** - A Dynamo function that creates dependencies between services, enabling automatic client generation and service discovery.
**Disaggregated Serving** - Dynamo's core architecture that separates prefill and decode phases into specialized engines to maximize GPU throughput and improve performance.
**Disaggregated Serving** - Dynamo's core architecture that separates prefill and decode phases into specialized engines to maximize GPU throughput and improve performance.
**Distributed Runtime** - Dynamo's Rust-based core system that manages service discovery, communication, and component lifecycle across distributed clusters.
**Distributed Runtime** - Dynamo's Rust-based core system that manages service discovery, communication, and component lifecycle across distributed clusters.
**Dynamo** - NVIDIA's high-performance distributed inference framework for Large Language Models (LLMs) and generative AI models, designed for multinode environments with disaggregated serving and cache-aware routing.
**Dynamo** - NVIDIA's high-performance distributed inference framework for Large Language Models (LLMs) and generative AI models, designed for multinode environments with disaggregated serving and cache-aware routing.
**Dynamo Artifact** - A packaged archive containing an inference graph and its dependencies, created using `dynamo build`. It's the containerized, deployable version of a Graph.
**Dynamo Cloud** - A Kubernetes platform providing managed deployment experience for Dynamo inference graphs.
**Dynamo Cloud** - A Kubernetes platform providing managed deployment experience for Dynamo inference graphs.
## E
## E
...
@@ -80,5 +76,8 @@
...
@@ -80,5 +76,8 @@
## V
## V
**vLLM** - High-throughput LLM serving engine with Ray distributed support and PagedAttention.
**vLLM** - High-throughput LLM serving engine with Ray distributed support and PagedAttention.
## W
**Wide Expert Parallelism (WideEP)** - Mixture-of-Experts deployment strategy that spreads experts across many GPUs (e.g., 64-way EP) so each GPU hosts only a few experts.
## X
## X
**xPyD (x Prefill y Decode)** - Dynamo notation describing disaggregated serving configurations where x prefill workers serve y decode workers. Dynamo supports runtime-reconfigurable xPyD.
**xPyD (x Prefill y Decode)** - Dynamo notation describing disaggregated serving configurations where x prefill workers serve y decode workers. Dynamo supports runtime-reconfigurable xPyD.