feat: extract deploymentType as interface (#2405)

72128596 · julienmancuso · GitHub · acbdabc4 · 72128596 · 72128596
Unverified Commit 72128596 authored Aug 15, 2025 by julienmancuso Committed by GitHub Aug 15, 2025
3 changed files
--- a/deploy/cloud/operator/internal/dynamo/grove.go
+++ b/deploy/cloud/operator/internal/dynamo/grove.go
+package dynamo
+import (
+	"fmt"
+	commonconsts "github.com/ai-dynamo/dynamo/deploy/cloud/operator/internal/consts"
+)
+type GroveMultinodeDeployer struct {
+	MultinodeDeployer
+}
+func (d *GroveMultinodeDeployer) GetLeaderHostname(serviceName string) string {
+	return fmt.Sprintf("${GROVE_PCSG_NAME}-${GROVE_PCSG_INDEX}-%s-%s-0.${GROVE_HEADLESS_SERVICE}", serviceName, commonconsts.GroveRoleSuffixLeader)
+}
+func (d *GroveMultinodeDeployer) GetNodeRank() string {
+	return "$((GROVE_PCLQ_POD_INDEX + 1))"
+}
+func (d *GroveMultinodeDeployer) GetHostNames(serviceName string, numberOfNodes int32) []string {
+	hostnames := make([]string, 0, numberOfNodes)
+	leaderHostname := d.GetLeaderHostname(serviceName)
+	hostnames = append(hostnames, leaderHostname)
+	// Add worker hostnames
+	for i := int32(0); i < numberOfNodes-1; i++ {
+		workerHostname := fmt.Sprintf("${GROVE_PCSG_NAME}-${GROVE_PCSG_INDEX}-%s-%s-%d.${GROVE_HEADLESS_SERVICE}",
+			serviceName, commonconsts.GroveRoleSuffixWorker, i)
+		hostnames = append(hostnames, workerHostname)
+	}
+	return hostnames
+}
--- a/deploy/cloud/operator/internal/dynamo/lws.go
+++ b/deploy/cloud/operator/internal/dynamo/lws.go
+package dynamo
+import "fmt"
+type LWSMultinodeDeployer struct {
+	MultinodeDeployer
+}
+func (d *LWSMultinodeDeployer) GetLeaderHostname(serviceName string) string {
+	return "${LWS_LEADER_ADDRESS}"
+}
+func (d *LWSMultinodeDeployer) GetNodeRank() string {
+	return "${LWS_WORKER_INDEX}"
+}
+func (d *LWSMultinodeDeployer) GetHostNames(serviceName string, numberOfNodes int32) []string {
+	hostnames := make([]string, numberOfNodes)
+	hostnames[0] = d.GetLeaderHostname(serviceName)
+	for i := int32(1); i < numberOfNodes; i++ {
+		hostnames[i] = fmt.Sprintf("${LWS_WORKER_%d_ADDRESS}", i)
+	}
+	return hostnames
+}
--- a/docs/guides/dynamo_deploy/multinode-deployment.md
+++ b/docs/guides/dynamo_deploy/multinode-deployment.md
+# Multinode Deployment Guide
+This guide explains how to deploy Dynamo workloads across multiple nodes. Multinode deployments enable you to scale compute-intensive LLM workloads across multiple physical machines, maximizing GPU utilization and supporting larger models.
+## Overview
+Dynamo supports multinode deployments through the `multinode` section in resource specifications. This allows you to:
+- Distribute workloads across multiple physical nodes
+- Scale GPU resources beyond a single machine
+- Support large models requiring extensive tensor parallelism
+- Achieve high availability and fault tolerance
+## Basic requirements
+- **Kubernetes Cluster**: Version 1.24 or later
+- **GPU Nodes**: Multiple nodes with NVIDIA GPUs
+- **High-Speed Networking**: InfiniBand, RoCE, or high-bandwidth Ethernet (recommended for optimal performance)
+### Advanced Multinode Orchestration
+#### Using Grove (default)
+For sophisticated multinode deployments, Dynamo integrates with advanced Kubernetes orchestration systems:
+- **[Grove](https://github.com/NVIDIA/grove)**: Network topology-aware gang scheduling and auto-scaling for AI workloads
+- (optional) **[KAI-Scheduler](https://github.com/NVIDIA/KAI-Scheduler)**: Kubernetes native scheduler optimized for AI workloads at scale
+These systems provide enhanced scheduling capabilities including topology-aware placement, gang scheduling, and coordinated auto-scaling across multiple nodes.
+**Features Enabled with Grove:**
+- Hierarchical gang scheduling with `PodGangSet` and `PodClique`
+- Multi-level horizontal auto-scaling
+- Custom startup ordering for components
+- Resource-aware rolling updates
+[KAI-Scheduler](https://github.com/NVIDIA/KAI-Scheduler) is an optional enhancement that provides a Kubernetes native scheduler optimized for AI workloads at large scale.
+**Features Enabled with KAI-Scheduler:**
+- Network topology-aware pod placement
+- AI workload-optimized scheduling algorithms
+- GPU resource awareness and allocation
+- Support for complex scheduling constraints
+- Integration with Grove for enhanced capabilities
+- Performance optimizations for large-scale deployments
+#### Using LWS and Volcano
+LWS is a simple multinode deployment mechanism that allows you to deploy a workload across multiple nodes.
+- **LWS**: [LWS Installation](https://github.com/NVIDIA/LWS#installation)
+- **Volcano**: [Volcano Installation](https://volcano.sh/docs/installation/install-volcano/)
+Volcano is a Kubernetes native scheduler optimized for AI workloads at scale. It is used in conjunction with LWS to provide gang scheduling support.
+## Core Concepts
+### The `multinode` Section
+The `multinode` section in a resource specification defines how many physical nodes the workload should span:
+```yaml
+multinode:
+  nodeCount: 2
+resources:
+  requests:
+    cpu: "10"
+    memory: "40Gi"
+  limits:
+    cpu: "10"
+    memory: "40Gi"
+    gpu: "2"            # 2 GPUs per node
+```
+### GPU Distribution
+The relationship between `multinode.nodeCount` and `gpu` is multiplicative:
+- **`multinode.nodeCount`**: Number of physical nodes
+- **`gpu`**: Number of GPUs per node
+- **Total GPUs**: `multinode.nodeCount × gpu`
+**Example:**
+- `multinode.nodeCount: "2"` + `gpu: "4"` = 8 total GPUs (4 GPUs per node across 2 nodes)
+- `multinode.nodeCount: "4"` + `gpu: "8"` = 32 total GPUs (8 GPUs per node across 4 nodes)
+### Tensor Parallelism Alignment
+The tensor parallelism (`tp-size` or `--tp`) in your command/args must match the total number of GPUs:
+```yaml
+# Example: 2 multinode.nodeCount × 4 GPUs = 8 total GPUs
+multinode:
+  nodeCount: 2
+resources:
+  limits:
+    gpu: "4"
+# Command args must use tp-size=8
+args:
+  - "--tp-size"
+  - "8"  # Must equal multinode.nodeCount × gpu
+```
+## Next Steps
+For additional support and examples, see the working multinode configurations in:
+- **SGLang**: [components/backends/sglang/deploy/](../../components/backends/sglang/deploy/)
+- **TensorRT-LLM**: [components/backends/trtllm/deploy/](../../components/backends/trtllm/deploy/)
+- **vLLM**: [components/backends/vllm/deploy/](../../components/backends/vllm/deploy/)
+These examples demonstrate proper usage of the `multinode` section with corresponding `gpu` limits and correct `tp-size` configuration.
\ No newline at end of file