By default, Dynamo discovers endpoints and model cards through etcd. An experimental Kubernetes backend is available for discovery that uses native Kubernetes EndpointSlices, eliminating the dependency on etcd.
**Using DynamoGraphDeployment (Recommended):**
When deploying with the Dynamo operator, simply add the annotation to your DGD manifest:
```yaml
metadata:
annotations:
nvidia.com/dynamo-discovery-backend:kubernetes
```
The operator will automatically configure the required EndpointSlices, labels, and pod environment variables. See [`dgd.yaml`](./dgd.yaml) for a complete example.
## Environment Variables
| **Variable** | **Description** | **Default** |
| ------------ | --------------- | ----------- |
| `DYN_DISCOVERY_BACKEND` | Discovery backend (`kv_store` for etcd or `kubernetes` for experimental EndpointSlice-based discovery) | `kv_store` |
## Metadata Endpoint
The Kubernetes backend exposes a `/metadata` endpoint on each pod that returns registered discovery information. This is used by the system status server to expose the discovery information to the clients on the discovery plane.
### Example Request
```bash
curl -s localhost:9090/metadata | jq
```
### Example Response
```json
{
"endpoints":{
"vllm-disagg/backend/generate":{
"component":"backend",
"endpoint":"generate",
"instance_id":12345678901234567890,
"namespace":"vllm-disagg",
"transport":{
"nats_tcp":"vllm-disagg_backend.generate-abc123"
}
}
},
"model_cards":{}
}
```
For documentation on Dynamo's service discovery system, see the [Service Discovery Guide](../../docs/kubernetes/service_discovery.md).
@@ -170,7 +170,7 @@ When creating a deployment, select the architecture pattern that best fits your
You can run the Frontend on one machine (e.g., a CPU node) and workers on different machines (GPU nodes). The Frontend serves as a framework-agnostic HTTP entry point that:
SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Service Discovery
Dynamo components (frontends, workers, planner) need to be able to discover each other and their capabilities at runtime. We refer to this as service discovery. There are 2 kinds of service discovery backends supported on Kubernetes.
## Discovery Backends
| Backend | Default | Dependencies | Use Case |
|---------|---------|--------------|----------|
| **Kubernetes** | ✅ Yes | None (native K8s) | Recommended for all Kubernetes deployments |
| **KV Store (etcd)** | No | etcd cluster | Legacy deployments |
## Kubernetes Discovery (Default)
Kubernetes discovery is the default and recommended backend when running on Kubernetes. It uses native Kubernetes primitives to facilitate discovery of components:
-**DynamoWorkerMetadata CRD**: Each worker stores its registered endpoints and model cards in a Custom Resource
-**EndpointSlices**: EndpointSlices signal each component's readiness status
### Implementation Details
Each pod runs a **discovery daemon** that watches both EndpointSlices and DynamoWorkerMetadata CRs. A pod is only discoverable when it appears as "ready" in an EndpointSlice AND has a corresponding `DynamoWorkerMetadata` CR. This correlation ensures pods aren't discoverable until they're ready, metadata is immediately available, and stale entries are cleaned up when pods terminate.
#### DynamoWorkerMetadata CRD
Each worker pod creates a `DynamoWorkerMetadata` CR that stores its discovery metadata:
```yaml
apiVersion:nvidia.com/v1alpha1
kind:DynamoWorkerMetadata
metadata:
name:my-worker-pod-abc123
namespace:dynamo-system
ownerReferences:
-apiVersion:v1
kind:Pod
name:my-worker-pod-abc123
uid:<pod-uid>
controller:true
spec:
data:
endpoints:
"dynamo/backend/generate":
type:Endpoint
namespace:dynamo
component:backend
endpoint:generate
instance_id:12345678901234567890
transport:
nats_tcp:"dynamo_backend.generate-abc123"
model_cards:{}
```
The CR is named after the pod and includes an owner reference for automatic garbage collection when the pod is deleted.
#### EndpointSlices
While DynamoWorkerMetadata resources provide an up-to-date snapshot of a component's capabilities, EndpointSlices give a snapshot of health of the various Dynamo components.
The operator creates a Kubernetes Service targeting the Dynamo components. The Kubernetes controller in turn creates and maintains EndpointSlice resources that keep track of the readiness of the pods targeted by the Service. Watching these slices gives us an up-to-date snapshot of which Dynamo components are ready to serve traffic.
##### Readiness Probes
A pod is marked ready if the readiness probe succeeds. On Dynamo workers, this is when the `generate` endpoint is available and healthy. These probes are configured by the Dynamo operator for each pod/component.
#### RBAC
Each Dynamo component pod is automatically given a ServiceAccount that allows it to watch `EndpointSlice` and `DynamoWorkerMetadata` resources within its namespace.
#### Environment Variables
The following environment variables are automatically injected into pods by the operator to facilitate service discovery:
| Variable | Description |
|----------|-------------|
| `DYN_DISCOVERY_BACKEND` | Set to `kubernetes` |
| `POD_NAME` | Pod name (via downward API) |
| `POD_NAMESPACE` | Pod namespace (via downward API) |
| `POD_UID` | Pod UID (via downward API) |
The pod's instance ID is deterministically generated by hashing the pod name, ensuring consistent identity and correlation between EndpointSlices and CRs.
## KV Store Discovery (etcd)
To use etcd-based discovery instead of Kubernetes-native discovery, add the annotation to your DynamoGraphDeployment:
```yaml
apiVersion:nvidia.com/v1alpha1
kind:DynamoGraphDeployment
metadata:
name:my-deployment
annotations:
nvidia.com/dynamo-discovery-backend:etcd
spec:
services:
# ...
```
This requires an etcd cluster to be available. The etcd connection is configured via the platform Helm chart.