Unverified Commit 4a718028 authored by Julien Mancuso's avatar Julien Mancuso Committed by GitHub
Browse files

feat: revamp kubernetes doc (#3173)


Signed-off-by: default avatarJulien Mancuso <161955438+julienmancuso@users.noreply.github.com>
Co-authored-by: default avatarhhzhang16 <54051230+hhzhang16@users.noreply.github.com>
parent 13a5d61b
......@@ -50,13 +50,13 @@ Quickstart
:hidden:
:caption: Kubernetes Deployment
Quickstart (K8s) <../guides/dynamo_deploy/README.md>
Detailed Installation Guide <../guides/dynamo_deploy/installation_guide.md>
Dynamo Operator <../guides/dynamo_deploy/dynamo_operator.md>
Metrics <../guides/dynamo_deploy/metrics.md>
Logging <../guides/dynamo_deploy/logging.md>
Multinode <../guides/dynamo_deploy/multinode-deployment.md>
Minikube Setup <../guides/dynamo_deploy/minikube.md>
Quickstart (K8s) <../kubernetes/README.md>
Detailed Installation Guide <../kubernetes/installation_guide.md>
Dynamo Operator <../kubernetes/dynamo_operator.md>
Metrics <../kubernetes/metrics.md>
Logging <../kubernetes/logging.md>
Multinode <../kubernetes/multinode-deployment.md>
Minikube Setup <../kubernetes/minikube.md>
.. toctree::
:hidden:
......
......@@ -31,12 +31,11 @@ helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${REL
helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default
# 3. Install Platform
kubectl create namespace ${NAMESPACE}
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE}
helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace
```
For more details or customization options, see **[Installation Guide for Dynamo Kubernetes Platform](/docs/guides/dynamo_deploy/installation_guide.md)**.
For more details or customization options (including multinode deployments), see **[Installation Guide for Dynamo Kubernetes Platform](/docs/kubernetes/installation_guide.md)**.
## 2. Choose Your Backend
......@@ -44,9 +43,9 @@ Each backend has deployment examples and configuration options:
| Backend | Available Configurations |
|---------|--------------------------|
| **[vLLM](/components/backends/vllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router, Disaggregated + Planner |
| **[vLLM](/components/backends/vllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router, Disaggregated + Planner, Disaggregated Multi-node |
| **[SGLang](/components/backends/sglang/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Planner, Disaggregated Multi-node |
| **[TensorRT-LLM](/components/backends/trtllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router |
| **[TensorRT-LLM](/components/backends/trtllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router, Disaggregated Multi-node |
## 3. Deploy Your First Model
......@@ -73,15 +72,15 @@ It's a Kubernetes Custom Resource that defines your inference pipeline:
- Scaling policies
- Frontend/backend connections
The scripts in the `components/<backend>/launch` folder like `agg.sh` demonstrate how you can serve your models locally. The corresponding YAML files like `agg.yaml` show you how you could create a kubernetes deployment for your inference graph.
Refer to the [API Reference and Documentation](/docs/kubernetes/api_reference.md) for more details.
## 📖 API Reference & Documentation
For detailed technical specifications of Dynamo's Kubernetes resources:
- **[API Reference](/docs/guides/dynamo_deploy/api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment`
- **[Operator Guide](/docs/guides/dynamo_deploy/dynamo_operator.md)** - Dynamo operator configuration and management
- **[Create Deployment](/docs/guides/dynamo_deploy/create_deployment.md)** - Step-by-step deployment creation examples
- **[API Reference](/docs/kubernetes/api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment`
- **[Operator Guide](/docs/kubernetes/dynamo_operator.md)** - Dynamo operator configuration and management
- **[Create Deployment](/docs/kubernetes/create_deployment.md)** - Step-by-step deployment creation examples
### Choosing Your Architecture Pattern
......@@ -165,7 +164,12 @@ Key customization points include:
## Additional Resources
- **[Examples](/examples/README.md)** - Complete working examples
- **[Create Custom Deployments](/docs/guides/dynamo_deploy/create_deployment.md)** - Build your own CRDs
- **[Operator Documentation](/docs/guides/dynamo_deploy/dynamo_operator.md)** - How the platform works
- **[Create Custom Deployments](/docs/kubernetes/create_deployment.md)** - Build your own CRDs
- **[Operator Documentation](/docs/kubernetes/dynamo_operator.md)** - How the platform works
- **[Helm Charts](/deploy/helm/README.md)** - For advanced users
- **[GitOps Deployment with FluxCD](/docs/guides/dynamo_deploy/fluxcd.md)** - For advanced users
\ No newline at end of file
- **[GitOps Deployment with FluxCD](/docs/kubernetes/fluxcd.md)** - For advanced users
- **[Logging](/docs/kubernetes/logging.md)** - For logging setup
- **[Multinode Deployment](/docs/kubernetes/multinode-deployment.md)** - For multinode deployment
- **[Grove](/docs/kubernetes/grove.md)** - For grove details and custom installation
- **[Monitoring](/docs/kubernetes/metrics.md)** - For monitoring setup
- **[Model Caching with Fluid](/docs/kubernetes/model_caching_with_fluid.md)** - For model caching with Fluid
\ No newline at end of file
......@@ -13,13 +13,13 @@ Select the architecture pattern as your template that best fits your use case.
For example, when using the `VLLM` inference backend:
- **Development / Testing**
Use [`agg.yaml`](../../../components/backends/vllm/deploy/agg.yaml) as the base configuration.
Use [`agg.yaml`](/components/backends/vllm/deploy/agg.yaml) as the base configuration.
- **Production with Load Balancing**
Use [`agg_router.yaml`](../../../components/backends/vllm/deploy/agg_router.yaml) to enable scalable, load-balanced inference.
Use [`agg_router.yaml`](/components/backends/vllm/deploy/agg_router.yaml) to enable scalable, load-balanced inference.
- **High Performance / Disaggregated Deployment**
Use [`disagg_router.yaml`](../../../components/backends/vllm/deploy/disagg_router.yaml) for maximum throughput and modular scalability.
Use [`disagg_router.yaml`](/components/backends/vllm/deploy/disagg_router.yaml) for maximum throughput and modular scalability.
## Step 2: Customize the Template
......@@ -90,7 +90,7 @@ Consult the corresponding sh file. Each of the python commands to launch a compo
The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]"
Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command.
If you are a Dynamo contributor the [dynamo run guide](../dynamo_run.md) for details on how to run this command.
If you are a Dynamo contributor the [dynamo run guide](/docs/guides/dynamo_run.md) for details on how to run this command.
## Step 3: Key Customization Points
......
......@@ -23,11 +23,11 @@ Dynamo operator is a Kubernetes operator that simplifies the deployment, configu
For the complete technical API reference for Dynamo Custom Resource Definitions, see:
**📖 [Dynamo CRD API Reference](/docs/guides/dynamo_deploy/api_reference.md)**
**📖 [Dynamo CRD API Reference](/docs/kubernetes/api_reference.md)**
## Installation
[See installation steps](/docs/guides/dynamo_deploy/installation_guide.md#overview)
[See installation steps](/docs/kubernetes/installation_guide.md#overview)
## Development
......
# GitOps Deployment with FluxCD
This section describes how to use FluxCD for GitOps-based deployment of Dynamo inference graphs. GitOps enables you to manage your Dynamo deployments declaratively using Git as the source of truth. We'll use the [aggregated vLLM example](../../../components/backends/vllm/README.md) to demonstrate the workflow.
This section describes how to use FluxCD for GitOps-based deployment of Dynamo inference graphs. GitOps enables you to manage your Dynamo deployments declaratively using Git as the source of truth. We'll use the [aggregated vLLM example](/components/backends/vllm/README.md) to demonstrate the workflow.
## Prerequisites
- A Kubernetes cluster with [Dynamo Cloud](/docs/guides/dynamo_deploy/installation_guide.md) installed
- A Kubernetes cluster with [Dynamo Cloud](/docs/kubernetes/installation_guide.md) installed
- [FluxCD](https://fluxcd.io/flux/installation/) installed in your cluster
- A Git repository to store your deployment configurations
......@@ -18,7 +18,7 @@ The GitOps workflow for Dynamo deployments consists of three main steps:
## Step 1: Build and Push Dynamo Cloud Operator
First, follow to [See Install Dynamo Cloud](/docs/guides/dynamo_deploy/installation_guide.md).
First, follow to [See Install Dynamo Cloud](/docs/kubernetes/installation_guide.md).
## Step 2: Create Initial Deployment
......
......@@ -19,7 +19,7 @@ Grove enables disaggregated serving by breaking down large language model infere
Grove implements disaggregated serving through several custom Kubernetes resources that provide declarative composition of role-based pod groups:
### PodGangSet
### PodCliqueSet
The top-level Grove object that defines a group of components managed and colocated together. Key features include:
- Support for autoscaling
- Topology-aware spread of replicas for availability
......@@ -39,10 +39,10 @@ A set of PodCliques that scale and are scheduled together, ideal for tightly cou
Grove provides several specialized features that make it particularly well-suited for disaggregated serving:
### Flexible Gang Scheduling
PodCliques and PodCliqueScalingGroups allow users to specify flexible gang-scheduling requirements at multiple levels within a PodGangSet to prevent resource deadlocks and ensure all components of a disaggregated system start together.
PodCliques and PodCliqueScalingGroups allow users to specify flexible gang-scheduling requirements at multiple levels within a PodCliqueSet to prevent resource deadlocks and ensure all components of a disaggregated system start together.
### Multi-level Horizontal Auto-Scaling
Supports pluggable horizontal auto-scaling solutions to scale PodGangSet, PodClique, and PodCliqueScalingGroup custom resources independently based on their specific metrics and requirements.
Supports pluggable horizontal auto-scaling solutions to scale PodCliqueSet, PodClique, and PodCliqueScalingGroup custom resources independently based on their specific metrics and requirements.
### Network Topology-Aware Scheduling
Allows specifying network topology pack and spread constraints to optimize for both network performance and service availability, crucial for disaggregated systems where components need efficient inter-node communication.
......
......@@ -21,7 +21,7 @@ Deploy and manage Dynamo inference graphs on Kubernetes with automated orchestra
## Quick Start Paths
Platform is installed using Dynamo Kubernetes Platform [helm chart](../../../deploy/cloud/helm/platform/README.md).
Platform is installed using Dynamo Kubernetes Platform [helm chart](/deploy/cloud/helm/platform/README.md).
**Path A: Production Install**
Install from published artifacts on your existing cluster → [Jump to Path A](#path-a-production-install)
......@@ -32,6 +32,20 @@ Set up Minikube first → [Minikube Setup](minikube.md) → Then follow Path A
**Path C: Custom Development**
Build from source for customization → [Jump to Path C](#path-c-custom-development)
All helm install commands could be overridden by either setting the values.yaml file or by passing in your own values.yaml:
```bash
helm install ...
-f your-values.yaml
```
and/or setting values as flags to the helm install command, as follows:
```bash
helm install ...
--set "your-value=your-value"
```
## Prerequisites
```bash
......@@ -68,7 +82,9 @@ helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace
```
> [!TIP]
> By default, Grove and Kai Scheduler are NOT installed. You can enable them by setting the following flags in the helm install command:
> For multinode deployments, you need to enable Grove and Kai Scheduler.
> You might chose to install them manually or through the dynamo-platform helm install command.
> When using the dynamo-platform helm install command, Grove and Kai Scheduler are NOT installed by default. You can enable their installation by setting the following flags in the helm install command:
```bash
--set "grove.enabled=true"
......@@ -111,7 +127,7 @@ docker build -t $DOCKER_SERVER/dynamo-operator:$IMAGE_TAG . && docker push $DOCK
cd -
# 3. Create namespace and secrets to be able to pull the operator image
# 3. Create namespace and secrets to be able to pull the operator image (only needed if you pushed the operator image to a private registry)
kubectl create namespace ${NAMESPACE}
kubectl create secret docker-registry docker-imagepullsecret \
--docker-server=${DOCKER_SERVER} \
......@@ -123,9 +139,8 @@ kubectl create secret docker-registry docker-imagepullsecret \
helm upgrade --install dynamo-crds ./crds/ --namespace default
# 5. Install Platform
helm repo add bitnami https://charts.bitnami.com/bitnami
helm dep build ./platform/
helm upgrade --install dynamo-platform ./platform/ \
helm install dynamo-platform ./platform/ \
--namespace ${NAMESPACE} \
--set dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/dynamo-operator \
--set dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG} \
......@@ -158,9 +173,9 @@ kubectl get pods -n ${NAMESPACE}
```
2. **Explore Backend Guides**
- [vLLM Deployments](../../../components/backends/vllm/deploy/README.md)
- [SGLang Deployments](../../../components/backends/sglang/deploy/README.md)
- [TensorRT-LLM Deployments](../../../components/backends/trtllm/deploy/README.md)
- [vLLM Deployments](/components/backends/vllm/deploy/README.md)
- [SGLang Deployments](/components/backends/sglang/deploy/README.md)
- [TensorRT-LLM Deployments](/components/backends/trtllm/deploy/README.md)
3. **Optional:**
- [Set up Prometheus & Grafana](metrics.md)
......@@ -200,7 +215,7 @@ just add the following to the helm install command:
## Advanced Options
- [Helm Chart Configuration](../../../deploy/cloud/helm/platform/README.md)
- [Helm Chart Configuration](/deploy/cloud/helm/platform/README.md)
- [GKE-specific setup](gke_setup.md)
- [Create custom deployments](create_deployment.md)
- [Dynamo Operator details](dynamo_operator.md)
......
......@@ -28,7 +28,7 @@ helm install prometheus -n monitoring --create-namespace prometheus-community/ku
> The commands enumerated below assume you have installed the kube-prometheus-stack with the installation method listed above. Depending on your installation configuration of the monitoring stack, you may need to modify the `kubectl` commands that follow in this document accordingly (e.g modifying Namespace or Service names accordingly).
### Install Dynamo Operator
Before setting up metrics collection, you'll need to have the Dynamo operator installed in your cluster. Follow our [Installation Guide](../dynamo_deploy/installation_guide.md) for detailed instructions on deploying the Dynamo operator.
Before setting up metrics collection, you'll need to have the Dynamo operator installed in your cluster. Follow our [Installation Guide](/docs/kubernetes/installation_guide.md) for detailed instructions on deploying the Dynamo operator.
Make sure to set the `prometheusEndpoint` to the Prometheus endpoint you installed in the previous step.
```bash
......@@ -64,8 +64,8 @@ This will create two components:
- A Worker component exposing metrics on its system port
Both components expose a `/metrics` endpoint following the OpenMetrics format, but with different metrics appropriate to their roles. For details about:
- Deployment configuration: See the [vLLM README](../../components/backends/vllm/README.md)
- Available metrics: See the [metrics guide](../metrics.md)
- Deployment configuration: See the [vLLM README](/components/backends/vllm/README.md)
- Available metrics: See the [metrics guide](/docs/guides/metrics.md)
### Validate the Deployment
......
......@@ -106,7 +106,7 @@ Hello star!
Note that this a very simple degenerate example which does not demonstrate the standard Dynamo FrontEnd-Backend deployment. The hello-world client is not a web server, it is a one-off function which sends the predefined text "world,sun,moon,star" to the backend. The example is meant to show the HelloWorldWorker. As such you will only see the HelloWorldWorker pod in deployment. The client will run and exit and the pod will not be operational.
Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/README.md) to install Dynamo Kubernetes Platform.
Follow the [Quickstart Guide](../../../docs/kubernetes/README.md) to install Dynamo Kubernetes Platform.
Then deploy to kubernetes using
```bash
......
......@@ -90,7 +90,7 @@ git clone https://github.com/ai-dynamo/dynamo.git
cd dynamo
```
2. Install Dynamo from Published Artifacts on NGC (see the [Dynamo Cloud guide](../../../docs/guides/dynamo_deploy/installation_guide.md)):
2. Install Dynamo from Published Artifacts on NGC (see the [Dynamo Cloud guide](../../../docs/kubernetes/installation_guide.md)):
```bash
export NAMESPACE=dynamo-cloud
export RELEASE_VERSION=0.3.2
......@@ -124,7 +124,7 @@ dynamo-platform-nats-0 2/2 Runnin
dynamo-platform-nats-box-5dbf45c748-kln82 1/1 Running 0 2m51s
```
There are other ways to install Dynamo, you can find them [here](../../../docs/guides/dynamo_deploy/installation_guide.md).
There are other ways to install Dynamo, you can find them [here](../../../docs/kubernetes/installation_guide.md).
### Task 4. Deploy a model
......
......@@ -19,7 +19,7 @@ export NAMESPACE=your-namespace
kubectl create namespace ${NAMESPACE}
```
2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/guides/dynamo_deploy/README.md)
2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/kubernetes/README.md)
3. **Kubernetes cluster with GPU support**
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment