@@ -24,21 +24,22 @@ Dynamo supports multinode deployments through the `multinode` section in resourc
...
@@ -24,21 +24,22 @@ Dynamo supports multinode deployments through the `multinode` section in resourc
For sophisticated multinode deployments, Dynamo integrates with advanced Kubernetes orchestration systems:
For sophisticated multinode deployments, Dynamo integrates with advanced Kubernetes orchestration systems:
-**[Grove](https://github.com/NVIDIA/grove/blob/main/docs/installation.md)**: Network topology-aware gang scheduling and auto-scaling for AI workloads
-**[Grove](https://github.com/NVIDIA/grove)**: Network topology-aware gang scheduling and auto-scaling for AI workloads
-(optional) **[KAI-Scheduler](https://github.com/NVIDIA/KAI-Scheduler)**: Kubernetes native scheduler optimized for AI workloads at scale
-**[KAI-Scheduler](https://github.com/NVIDIA/KAI-Scheduler)**: Kubernetes native scheduler optimized for AI workloads at scale
These systems provide enhanced scheduling capabilities including topology-aware placement, gang scheduling, and coordinated auto-scaling across multiple nodes.
These systems provide enhanced scheduling capabilities including topology-aware placement, gang scheduling, and coordinated auto-scaling across multiple nodes.
**Features Enabled with Grove:**
**Features Enabled with Grove:**
-Hierarchical gang scheduling with `PodGangSet` and `PodClique`
-Declarative composition of AI workloads
- Multi-level horizontal auto-scaling
- Multi-level horizontal auto-scaling
- Custom startup ordering for components
- Custom startup ordering for components
- Resource-aware rolling updates
- Resource-aware rolling updates
[KAI-Scheduler](https://github.com/NVIDIA/KAI-Scheduler) is an optional enhancement that provides a Kubernetes native scheduler optimized for AI workloads at large scale.
[KAI-Scheduler](https://github.com/NVIDIA/KAI-Scheduler) is a Kubernetes native scheduler optimized for AI workloads at large scale.
**Features Enabled with KAI-Scheduler:**
**Features Enabled with KAI-Scheduler:**
- Gang scheduling
- Network topology-aware pod placement
- Network topology-aware pod placement
- AI workload-optimized scheduling algorithms
- AI workload-optimized scheduling algorithms
- GPU resource awareness and allocation
- GPU resource awareness and allocation
...
@@ -46,6 +47,14 @@ These systems provide enhanced scheduling capabilities including topology-aware
...
@@ -46,6 +47,14 @@ These systems provide enhanced scheduling capabilities including topology-aware
- Integration with Grove for enhanced capabilities
- Integration with Grove for enhanced capabilities
- Performance optimizations for large-scale deployments
- Performance optimizations for large-scale deployments
##### Prerequisites
-[Grove](https://github.com/NVIDIA/grove/blob/main/docs/installation.md) installed on the cluster
- (Optional) [KAI-Scheduler](https://github.com/NVIDIA/KAI-Scheduler) installed on the cluster with default queue name `dynamo` created. You can use a different queue name by setting the `nvidia.com/kai-scheduler-queue` annotation on the DGD resource.
KAI-Scheduler is optional but recommended for advanced scheduling capabilities.
#### Using LWS and Volcano
#### Using LWS and Volcano
LWS is a simple multinode deployment mechanism that allows you to deploy a workload across multiple nodes.
LWS is a simple multinode deployment mechanism that allows you to deploy a workload across multiple nodes.
...
@@ -58,14 +67,68 @@ Volcano is a Kubernetes native scheduler optimized for AI workloads at scale. It
...
@@ -58,14 +67,68 @@ Volcano is a Kubernetes native scheduler optimized for AI workloads at scale. It
## Core Concepts
## Core Concepts
### Orchestrator Selection Algorithm
Dynamo automatically selects the best available orchestrator for multinode deployments using the following logic:
#### When Both Grove and LWS are Available:
-**Grove is selected by default** (recommended for advanced AI workloads)
-**LWS is selected** if you explicitly set `nvidia.com/enable-grove: "false"` annotation on your DGD resource
#### When Only One Orchestrator is Available:
- The installed orchestrator (Grove or LWS) is automatically selected
#### Scheduler Integration:
-**With Grove**: Automatically integrates with [KAI-Scheduler](https://github.com/NVIDIA/KAI-Scheduler) when available, providing:
- Advanced queue management via `nvidia.com/kai-scheduler-queue` annotation
- AI-optimized scheduling policies
- Resource-aware workload placement
-**With LWS**: Uses Volcano scheduler for gang scheduling and resource coordination
#### Configuration Examples:
**Default (Grove with KAI-Scheduler):**
```yaml
apiVersion:nvidia.com/v1alpha1
kind:DynamoGraphDeployment
metadata:
name:my-multinode-deployment
annotations:
nvidia.com/kai-scheduler-queue:"gpu-intensive"# Optional: defaults to "dynamo"
spec:
# ... your deployment spec
```
**Force LWS usage:**
```yaml
apiVersion:nvidia.com/v1alpha1
kind:DynamoGraphDeployment
metadata:
name:my-multinode-deployment
annotations:
nvidia.com/enable-grove:"false"
spec:
# ... your deployment spec
```
### The `multinode` Section
### The `multinode` Section
The `multinode` section in a resource specification defines how many physical nodes the workload should span:
The `multinode` section in a resource specification defines how many physical nodes the workload should span:
```yaml
```yaml
multinode:
apiVersion:nvidia.com/v1alpha1
kind:DynamoGraphDeployment
metadata:
name:my-multinode-deployment
spec:
# ... your deployment spec
services:
my-service:
...
multinode:
nodeCount:2
nodeCount:2
resources:
resources:
limits:
limits:
gpu:"2"# 2 GPUs per node
gpu:"2"# 2 GPUs per node
```
```
...
@@ -88,16 +151,28 @@ The tensor parallelism (`tp-size` or `--tp`) in your command/args must match the
...
@@ -88,16 +151,28 @@ The tensor parallelism (`tp-size` or `--tp`) in your command/args must match the