Unverified Commit ad3a46a6 authored by Julien Mancuso's avatar Julien Mancuso Committed by GitHub
Browse files

feat: deprecated namespace-restricted mode (#7934)

parent 0078e283
...@@ -85,10 +85,11 @@ kubectl create secret generic hf-token-secret \ ...@@ -85,10 +85,11 @@ kubectl create secret generic hf-token-secret \
### Step 1.3: Install Dynamo Platform ### Step 1.3: Install Dynamo Platform
If your cluster uses namespace-restricted Dynamo operators, you'll need to install the Dynamo platform in the workload namespace. Follow the [Dynamo Kubernetes Installation Guide](https://github.com/ai-dynamo/dynamo/blob/main/docs/kubernetes/installation-guide.md) to install the platform in `dynamo-bench`. Follow the [Dynamo Kubernetes Installation Guide](https://github.com/ai-dynamo/dynamo/blob/main/docs/kubernetes/installation-guide.md) to install the platform in `dynamo-bench`.
> **Note:** Namespace-restricted mode (`namespaceRestriction.enabled=true`) is deprecated and will be removed in a future release. Use cluster-wide mode for new deployments.
**Key Configuration Notes:** **Key Configuration Notes:**
- If your cluster uses namespace restrictions, ensure `dynamo-operator.namespaceRestriction.enabled=true` is set during installation
- Adjust version tags to match your cluster's available Dynamo versions - Adjust version tags to match your cluster's available Dynamo versions
- If you encounter operator compatibility issues (e.g., unsupported MPI arguments), consult your cluster administrator or the Dynamo troubleshooting documentation - If you encounter operator compatibility issues (e.g., unsupported MPI arguments), consult your cluster administrator or the Dynamo troubleshooting documentation
......
...@@ -332,10 +332,13 @@ See [AI Configurator documentation](https://github.com/ai-dynamo/aiconfigurator# ...@@ -332,10 +332,13 @@ See [AI Configurator documentation](https://github.com/ai-dynamo/aiconfigurator#
The operator automatically discovers GPU resources from cluster nodes, providing hardware info (GPU model, VRAM, GPUs per node) and automatic profiling search space calculation. The operator automatically discovers GPU resources from cluster nodes, providing hardware info (GPU model, VRAM, GPUs per node) and automatic profiling search space calculation.
**Requirements:** **Requirements:**
- **Cluster-scoped operators**: Have node read permissions by default - **Cluster-scoped operators** (recommended): Have node read permissions by default. GPU discovery works automatically.
- **Namespace-scoped operators**: GPU discovery is enabled by default when installing via Helm — the chart provisions the required ClusterRole/ClusterRoleBinding automatically
**For namespace-scoped operators**, GPU discovery is controlled by a Helm value: > **DEPRECATED:** The following applies only to namespace-scoped operators, which are deprecated and will be removed in a future release. Use cluster-wide mode for new deployments.
- **Namespace-scoped operators** (deprecated): GPU discovery is enabled by default when installing via Helm — the chart provisions the required ClusterRole/ClusterRoleBinding automatically
**For namespace-scoped operators (deprecated)**, GPU discovery is controlled by a Helm value:
```bash ```bash
# GPU discovery enabled (default) — Helm provisions read-only node access automatically # GPU discovery enabled (default) — Helm provisions read-only node access automatically
......
...@@ -107,7 +107,7 @@ For the full spec reference, see the [DGDR API Reference](api-reference.md) and ...@@ -107,7 +107,7 @@ For the full spec reference, see the [DGDR API Reference](api-reference.md) and
[Profiler Guide](../components/profiler/profiler-guide.md). [Profiler Guide](../components/profiler/profiler-guide.md).
> [!IMPORTANT] > [!IMPORTANT]
> If you are using a **namespace-scoped operator** with GPU discovery disabled, you must also > If you are using a **namespace-scoped operator** (deprecated) with GPU discovery disabled, you must also
> provide explicit hardware info or the DGDR will be rejected at admission: > provide explicit hardware info or the DGDR will be rejected at admission:
> >
> ```yaml > ```yaml
...@@ -119,8 +119,10 @@ For the full spec reference, see the [DGDR API Reference](api-reference.md) and ...@@ -119,8 +119,10 @@ For the full spec reference, see the [DGDR API Reference](api-reference.md) and
> vramMb: 81920 > vramMb: 81920
> ``` > ```
> >
> See the [installation guide](installation-guide.md#gpu-discovery-for-dynamographdeploymentrequests-with-namespace-scoped-operators) > See the [installation guide](installation-guide.md#gpu-discovery-for-dynamographdeploymentrequests-deprecated-namespace-scoped-mode)
> for details. > for details.
>
> **Note:** Namespace-scoped mode is deprecated. Use cluster-wide mode for new deployments.
## Step 3: Monitor Profiling Progress ## Step 3: Monitor Profiling Progress
...@@ -269,7 +271,7 @@ kubectl describe dynamographdeploymentrequest qwen3-first-model -n ${NAMESPACE} ...@@ -269,7 +271,7 @@ kubectl describe dynamographdeploymentrequest qwen3-first-model -n ${NAMESPACE}
Common causes: no available GPU nodes, image pull failure (check image tag; NGC credentials are Common causes: no available GPU nodes, image pull failure (check image tag; NGC credentials are
optional but may be needed if you hit rate limits pulling from public NGC), missing `hardware` optional but may be needed if you hit rate limits pulling from public NGC), missing `hardware`
config for a namespace-scoped operator. config for a namespace-scoped operator (deprecated).
> [!TIP] > [!TIP]
> **GPU node taints** are a frequent cause of pods staying `Pending`. Many clusters (including > **GPU node taints** are a frequent cause of pods staying `Pending`. Many clusters (including
......
...@@ -28,7 +28,7 @@ Dynamo operator is a Kubernetes operator that simplifies the deployment, configu ...@@ -28,7 +28,7 @@ Dynamo operator is a Kubernetes operator that simplifies the deployment, configu
The Dynamo operator supports three deployment modes to accommodate different cluster environments and use cases: The Dynamo operator supports three deployment modes to accommodate different cluster environments and use cases:
### 1. Cluster-Wide Mode (Default) ### 1. Cluster-Wide Mode (Default, Recommended)
The operator monitors and manages DynamoGraph resources across **all namespaces** in the cluster. The operator monitors and manages DynamoGraph resources across **all namespaces** in the cluster.
...@@ -39,7 +39,9 @@ The operator monitors and manages DynamoGraph resources across **all namespaces* ...@@ -39,7 +39,9 @@ The operator monitors and manages DynamoGraph resources across **all namespaces*
--- ---
### 2. Namespace-Scoped Mode ### 2. Namespace-Scoped Mode (DEPRECATED)
> **DEPRECATED:** Namespace-scoped mode (`namespaceRestriction.enabled=true`) is deprecated and will be removed in a future release. Use cluster-wide mode instead. Do not use this for new deployments.
The operator monitors and manages DynamoGraph resources **only in a specific namespace**. A lease marker is created to signal the operator's presence to any cluster-wide operators. The operator monitors and manages DynamoGraph resources **only in a specific namespace**. A lease marker is created to signal the operator's presence to any cluster-wide operators.
...@@ -59,7 +61,9 @@ helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \ ...@@ -59,7 +61,9 @@ helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
--- ---
### 3. Hybrid Mode ### 3. Hybrid Mode (DEPRECATED)
> **DEPRECATED:** Hybrid mode relies on namespace-scoped operators, which are deprecated and will be removed in a future release. Use a single cluster-wide operator instead.
A **cluster-wide operator** manages most namespaces, while **one or more namespace-scoped operators** run in specific namespaces (e.g., for testing new versions). The cluster-wide operator automatically detects and excludes namespaces with namespace-scoped operators using lease markers. A **cluster-wide operator** manages most namespaces, while **one or more namespace-scoped operators** run in specific namespaces (e.g., for testing new versions). The cluster-wide operator automatically detects and excludes namespaces with namespace-scoped operators using lease markers.
...@@ -128,7 +132,6 @@ The Dynamo Operator uses **Kubernetes admission webhooks** for real-time validat ...@@ -128,7 +132,6 @@ The Dynamo Operator uses **Kubernetes admission webhooks** for real-time validat
- ✅ Shared certificate infrastructure across all webhook types - ✅ Shared certificate infrastructure across all webhook types
- ✅ Automatic certificate generation and rotation (default, all environments) - ✅ Automatic certificate generation and rotation (default, all environments)
- ✅ cert-manager integration (optional, for custom PKI) - ✅ cert-manager integration (optional, for custom PKI)
- ✅ Multi-operator support with lease-based coordination
- ✅ Immutability enforcement for critical fields - ✅ Immutability enforcement for critical fields
For complete documentation on webhooks, certificate management, and troubleshooting, see: For complete documentation on webhooks, certificate management, and troubleshooting, see:
...@@ -175,7 +178,7 @@ helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-$ ...@@ -175,7 +178,7 @@ helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-$
helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace
``` ```
> **Note:** For shared/multi-tenant clusters or testing scenarios, see [Deployment Modes](#deployment-modes) above for namespace-scoped and hybrid configurations. > **Note:** Namespace-scoped and hybrid deployment modes are deprecated. Use cluster-wide mode for all new deployments. See [Deployment Modes](#deployment-modes) above if you need backward-compatible configurations.
### Building from Source ### Building from Source
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment