Unverified Commit eb73c2b0 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat: remove deploy/utils rbac (#3771)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
parent d81a00ef
...@@ -8,7 +8,6 @@ metadata: ...@@ -8,7 +8,6 @@ metadata:
spec: spec:
template: template:
spec: spec:
serviceAccountName: dynamo-sa
imagePullSecrets: imagePullSecrets:
- name: docker-imagepullsecret - name: docker-imagepullsecret
securityContext: securityContext:
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
# TODO: update to dgdr spec for AIC
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
# TODO: update to dgdr spec for online mode
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
# TODO: update to dgdr spec for MoE model
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
......
...@@ -17,9 +17,6 @@ This includes: ...@@ -17,9 +17,6 @@ This includes:
- `setup_benchmarking_resources.sh` — Sets up benchmarking and profiling resources in your existing Dynamo namespace - `setup_benchmarking_resources.sh` — Sets up benchmarking and profiling resources in your existing Dynamo namespace
- `manifests/` - `manifests/`
- `serviceaccount.yaml` — ServiceAccount `dynamo-sa` for benchmarking and profiling jobs
- `role.yaml` — Role `dynamo-role` with necessary permissions
- `rolebinding.yaml` — RoleBinding `dynamo-binding`
- `pvc.yaml` — PVC `dynamo-pvc` for storing profiler results and configurations - `pvc.yaml` — PVC `dynamo-pvc` for storing profiler results and configurations
- `pvc-access-pod.yaml` — short‑lived pod for copying profiler results from the PVC - `pvc-access-pod.yaml` — short‑lived pod for copying profiler results from the PVC
- `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC downloads) - `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC downloads)
...@@ -63,9 +60,6 @@ deploy/utils/setup_benchmarking_resources.sh ...@@ -63,9 +60,6 @@ deploy/utils/setup_benchmarking_resources.sh
This script applies the following manifests to your existing Dynamo namespace: This script applies the following manifests to your existing Dynamo namespace:
- `deploy/utils/manifests/serviceaccount.yaml` - ServiceAccount `dynamo-sa`
- `deploy/utils/manifests/role.yaml` - Role `dynamo-role`
- `deploy/utils/manifests/rolebinding.yaml` - RoleBinding `dynamo-binding`
- `deploy/utils/manifests/pvc.yaml` - PVC `dynamo-pvc` - `deploy/utils/manifests/pvc.yaml` - PVC `dynamo-pvc`
If `HF_TOKEN` is provided, it also creates a secret for HuggingFace model access. If `HF_TOKEN` is provided, it also creates a secret for HuggingFace model access.
...@@ -73,7 +67,6 @@ If `HF_TOKEN` is provided, it also creates a secret for HuggingFace model access ...@@ -73,7 +67,6 @@ If `HF_TOKEN` is provided, it also creates a secret for HuggingFace model access
After running the setup script, verify the resources by checking: After running the setup script, verify the resources by checking:
```bash ```bash
kubectl get serviceaccount dynamo-sa -n $NAMESPACE
kubectl get pvc dynamo-pvc -n $NAMESPACE kubectl get pvc dynamo-pvc -n $NAMESPACE
``` ```
...@@ -130,5 +123,4 @@ For complete benchmarking and profiling workflows: ...@@ -130,5 +123,4 @@ For complete benchmarking and profiling workflows:
## Notes ## Notes
- Profiling job manifest remains in `benchmarks/profiler/deploy/profile_sla_job.yaml` and relies on the ServiceAccount/PVC created by the setup script.
- This setup is focused on benchmarking and profiling resources only - the main Dynamo platform must be installed separately. - This setup is focused on benchmarking and profiling resources only - the main Dynamo platform must be installed separately.
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: dynamo-role
namespace: ${NAMESPACE}
rules:
# DynamoGraphDeployment custom resources - needed for create/get/delete operations
- apiGroups: ["nvidia.com"]
resources: ["dynamographdeployments"]
verbs: ["get", "create", "delete"]
# Pods - needed for listing pods by label selector and getting logs
- apiGroups: [""]
resources: ["pods"]
verbs: ["list"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
# Services and Deployments - needed for vLLM deployments
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "create", "delete"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "create", "delete"]
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: dynamo-binding
namespace: ${NAMESPACE}
subjects:
- kind: ServiceAccount
name: dynamo-sa
namespace: ${NAMESPACE}
roleRef:
kind: Role
name: dynamo-role
apiGroup: rbac.authorization.k8s.io
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ServiceAccount
metadata:
name: dynamo-sa
namespace: ${NAMESPACE}
imagePullSecrets:
- name: nvcr-imagepullsecret
...@@ -30,7 +30,7 @@ Usage: ...@@ -30,7 +30,7 @@ Usage:
NAMESPACE=<ns> [HF_TOKEN=<token>] deploy/utils/setup_benchmarking_resources.sh NAMESPACE=<ns> [HF_TOKEN=<token>] deploy/utils/setup_benchmarking_resources.sh
Sets up benchmarking and profiling resources in an existing Dynamo namespace: Sets up benchmarking and profiling resources in an existing Dynamo namespace:
- Applies common manifests (ServiceAccount, Role, RoleBinding, PVC) - Applies common manifests (PVC)
- Creates HuggingFace token secret if HF_TOKEN provided - Creates HuggingFace token secret if HF_TOKEN provided
- Installs benchmark dependencies if requirements.txt exists - Installs benchmark dependencies if requirements.txt exists
...@@ -100,7 +100,6 @@ ok "Benchmarking resource setup complete" ...@@ -100,7 +100,6 @@ ok "Benchmarking resource setup complete"
# Verify installation # Verify installation
log "Verifying installation..." log "Verifying installation..."
kubectl get serviceaccount dynamo-sa -n "$NAMESPACE" >/dev/null && ok "ServiceAccount dynamo-sa exists" || err "ServiceAccount dynamo-sa not found"
kubectl get pvc dynamo-pvc -n "$NAMESPACE" >/dev/null && ok "PVC dynamo-pvc exists" || err "PVC dynamo-pvc not found" kubectl get pvc dynamo-pvc -n "$NAMESPACE" >/dev/null && ok "PVC dynamo-pvc exists" || err "PVC dynamo-pvc not found"
if [[ -n "$HF_TOKEN" ]]; then if [[ -n "$HF_TOKEN" ]]; then
......
...@@ -326,7 +326,7 @@ The server-side benchmarking solution: ...@@ -326,7 +326,7 @@ The server-side benchmarking solution:
## Prerequisites ## Prerequisites
1. **Kubernetes cluster** with NVIDIA GPUs and Dynamo namespace setup (see [Dynamo Cloud/Platform docs](/docs/kubernetes/README.md)) 1. **Kubernetes cluster** with NVIDIA GPUs and Dynamo namespace setup (see [Dynamo Cloud/Platform docs](/docs/kubernetes/README.md))
2. **Storage and service account** PersistentVolumeClaim and service account configured with appropriate permissions (see [deploy/utils README](../../deploy/utils/README.md)) 2. **Storage** PersistentVolumeClaim configured with appropriate permissions (see [deploy/utils README](../../deploy/utils/README.md))
3. **Docker image** containing the Dynamo benchmarking tools 3. **Docker image** containing the Dynamo benchmarking tools
## Quick Start ## Quick Start
...@@ -489,7 +489,6 @@ kubectl describe pod <pod-name> -n $NAMESPACE ...@@ -489,7 +489,6 @@ kubectl describe pod <pod-name> -n $NAMESPACE
### Common Issues ### Common Issues
1. **Service not found**: Ensure your DynamoGraphDeployment frontend service is running 1. **Service not found**: Ensure your DynamoGraphDeployment frontend service is running
2. **Service account permissions**: Verify `dynamo-sa` has necessary RBAC permissions
3. **PVC access**: Check that `dynamo-pvc` is properly configured and accessible 3. **PVC access**: Check that `dynamo-pvc` is properly configured and accessible
4. **Image pull issues**: Ensure the Docker image is accessible from the cluster 4. **Image pull issues**: Ensure the Docker image is accessible from the cluster
5. **Resource constraints**: Adjust resource limits if the job is being evicted 5. **Resource constraints**: Adjust resource limits if the job is being evicted
...@@ -500,9 +499,6 @@ kubectl describe pod <pod-name> -n $NAMESPACE ...@@ -500,9 +499,6 @@ kubectl describe pod <pod-name> -n $NAMESPACE
# Check PVC status # Check PVC status
kubectl get pvc dynamo-pvc -n $NAMESPACE kubectl get pvc dynamo-pvc -n $NAMESPACE
# Verify service account
kubectl get sa dynamo-sa -n $NAMESPACE
# Check service endpoints # Check service endpoints
kubectl get svc -n $NAMESPACE kubectl get svc -n $NAMESPACE
......
...@@ -217,7 +217,7 @@ If you see `ErrImagePull` or `ImagePullBackOff` errors with 401 unauthorized mes ...@@ -217,7 +217,7 @@ If you see `ErrImagePull` or `ImagePullBackOff` errors with 401 unauthorized mes
2. Verify the service account was created with the image pull secret: 2. Verify the service account was created with the image pull secret:
```bash ```bash
kubectl get serviceaccount dynamo-sa -n $NAMESPACE -o yaml kubectl get serviceaccount dgdr-profiling-job -n $NAMESPACE -o yaml
``` ```
3. The service account should show `imagePullSecrets` containing `nvcr-imagepullsecret`. 3. The service account should show `imagePullSecrets` containing `nvcr-imagepullsecret`.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment