Unverified Commit 7fbd43ae authored by Anish's avatar Anish Committed by GitHub
Browse files

docs: Update dynamo_glossary.md (#2082)


Signed-off-by: default avatarAnish <80174047+athreesh@users.noreply.github.com>
parent 3175b10d
...@@ -11,16 +11,12 @@ ...@@ -11,16 +11,12 @@
## D ## D
**Decode Phase** - The second phase of LLM inference that generates output tokens one at a time. **Decode Phase** - The second phase of LLM inference that generates output tokens one at a time.
**depends()** - A Dynamo function that creates dependencies between services, enabling automatic client generation and service discovery.
**Disaggregated Serving** - Dynamo's core architecture that separates prefill and decode phases into specialized engines to maximize GPU throughput and improve performance. **Disaggregated Serving** - Dynamo's core architecture that separates prefill and decode phases into specialized engines to maximize GPU throughput and improve performance.
**Distributed Runtime** - Dynamo's Rust-based core system that manages service discovery, communication, and component lifecycle across distributed clusters. **Distributed Runtime** - Dynamo's Rust-based core system that manages service discovery, communication, and component lifecycle across distributed clusters.
**Dynamo** - NVIDIA's high-performance distributed inference framework for Large Language Models (LLMs) and generative AI models, designed for multinode environments with disaggregated serving and cache-aware routing. **Dynamo** - NVIDIA's high-performance distributed inference framework for Large Language Models (LLMs) and generative AI models, designed for multinode environments with disaggregated serving and cache-aware routing.
**Dynamo Artifact** - A packaged archive containing an inference graph and its dependencies, created using `dynamo build`. It's the containerized, deployable version of a Graph.
**Dynamo Cloud** - A Kubernetes platform providing managed deployment experience for Dynamo inference graphs. **Dynamo Cloud** - A Kubernetes platform providing managed deployment experience for Dynamo inference graphs.
## E ## E
...@@ -80,5 +76,8 @@ ...@@ -80,5 +76,8 @@
## V ## V
**vLLM** - High-throughput LLM serving engine with Ray distributed support and PagedAttention. **vLLM** - High-throughput LLM serving engine with Ray distributed support and PagedAttention.
## W
**Wide Expert Parallelism (WideEP)** - Mixture-of-Experts deployment strategy that spreads experts across many GPUs (e.g., 64-way EP) so each GPU hosts only a few experts.
## X ## X
**xPyD (x Prefill y Decode)** - Dynamo notation describing disaggregated serving configurations where x prefill workers serve y decode workers. Dynamo supports runtime-reconfigurable xPyD. **xPyD (x Prefill y Decode)** - Dynamo notation describing disaggregated serving configurations where x prefill workers serve y decode workers. Dynamo supports runtime-reconfigurable xPyD.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment