Unverified Commit eb73c2b0 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat: remove deploy/utils rbac (#3771)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
parent d81a00ef
......@@ -8,7 +8,6 @@ metadata:
spec:
template:
spec:
serviceAccountName: dynamo-sa
imagePullSecrets:
- name: docker-imagepullsecret
securityContext:
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# TODO: update to dgdr spec for AIC
apiVersion: batch/v1
kind: Job
metadata:
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# TODO: update to dgdr spec for online mode
apiVersion: batch/v1
kind: Job
metadata:
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# TODO: update to dgdr spec for MoE model
apiVersion: batch/v1
kind: Job
metadata:
......
......@@ -17,9 +17,6 @@ This includes:
- `setup_benchmarking_resources.sh` — Sets up benchmarking and profiling resources in your existing Dynamo namespace
- `manifests/`
- `serviceaccount.yaml` — ServiceAccount `dynamo-sa` for benchmarking and profiling jobs
- `role.yaml` — Role `dynamo-role` with necessary permissions
- `rolebinding.yaml` — RoleBinding `dynamo-binding`
- `pvc.yaml` — PVC `dynamo-pvc` for storing profiler results and configurations
- `pvc-access-pod.yaml` — short‑lived pod for copying profiler results from the PVC
- `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC downloads)
......@@ -63,9 +60,6 @@ deploy/utils/setup_benchmarking_resources.sh
This script applies the following manifests to your existing Dynamo namespace:
- `deploy/utils/manifests/serviceaccount.yaml` - ServiceAccount `dynamo-sa`
- `deploy/utils/manifests/role.yaml` - Role `dynamo-role`
- `deploy/utils/manifests/rolebinding.yaml` - RoleBinding `dynamo-binding`
- `deploy/utils/manifests/pvc.yaml` - PVC `dynamo-pvc`
If `HF_TOKEN` is provided, it also creates a secret for HuggingFace model access.
......@@ -73,7 +67,6 @@ If `HF_TOKEN` is provided, it also creates a secret for HuggingFace model access
After running the setup script, verify the resources by checking:
```bash
kubectl get serviceaccount dynamo-sa -n $NAMESPACE
kubectl get pvc dynamo-pvc -n $NAMESPACE
```
......@@ -130,5 +123,4 @@ For complete benchmarking and profiling workflows:
## Notes
- Profiling job manifest remains in `benchmarks/profiler/deploy/profile_sla_job.yaml` and relies on the ServiceAccount/PVC created by the setup script.
- This setup is focused on benchmarking and profiling resources only - the main Dynamo platform must be installed separately.
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: dynamo-role
namespace: ${NAMESPACE}
rules:
# DynamoGraphDeployment custom resources - needed for create/get/delete operations
- apiGroups: ["nvidia.com"]
resources: ["dynamographdeployments"]
verbs: ["get", "create", "delete"]
# Pods - needed for listing pods by label selector and getting logs
- apiGroups: [""]
resources: ["pods"]
verbs: ["list"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
# Services and Deployments - needed for vLLM deployments
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "create", "delete"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "create", "delete"]
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: dynamo-binding
namespace: ${NAMESPACE}
subjects:
- kind: ServiceAccount
name: dynamo-sa
namespace: ${NAMESPACE}
roleRef:
kind: Role
name: dynamo-role
apiGroup: rbac.authorization.k8s.io
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ServiceAccount
metadata:
name: dynamo-sa
namespace: ${NAMESPACE}
imagePullSecrets:
- name: nvcr-imagepullsecret
......@@ -30,7 +30,7 @@ Usage:
NAMESPACE=<ns> [HF_TOKEN=<token>] deploy/utils/setup_benchmarking_resources.sh
Sets up benchmarking and profiling resources in an existing Dynamo namespace:
- Applies common manifests (ServiceAccount, Role, RoleBinding, PVC)
- Applies common manifests (PVC)
- Creates HuggingFace token secret if HF_TOKEN provided
- Installs benchmark dependencies if requirements.txt exists
......@@ -100,7 +100,6 @@ ok "Benchmarking resource setup complete"
# Verify installation
log "Verifying installation..."
kubectl get serviceaccount dynamo-sa -n "$NAMESPACE" >/dev/null && ok "ServiceAccount dynamo-sa exists" || err "ServiceAccount dynamo-sa not found"
kubectl get pvc dynamo-pvc -n "$NAMESPACE" >/dev/null && ok "PVC dynamo-pvc exists" || err "PVC dynamo-pvc not found"
if [[ -n "$HF_TOKEN" ]]; then
......
......@@ -326,7 +326,7 @@ The server-side benchmarking solution:
## Prerequisites
1. **Kubernetes cluster** with NVIDIA GPUs and Dynamo namespace setup (see [Dynamo Cloud/Platform docs](/docs/kubernetes/README.md))
2. **Storage and service account** PersistentVolumeClaim and service account configured with appropriate permissions (see [deploy/utils README](../../deploy/utils/README.md))
2. **Storage** PersistentVolumeClaim configured with appropriate permissions (see [deploy/utils README](../../deploy/utils/README.md))
3. **Docker image** containing the Dynamo benchmarking tools
## Quick Start
......@@ -489,7 +489,6 @@ kubectl describe pod <pod-name> -n $NAMESPACE
### Common Issues
1. **Service not found**: Ensure your DynamoGraphDeployment frontend service is running
2. **Service account permissions**: Verify `dynamo-sa` has necessary RBAC permissions
3. **PVC access**: Check that `dynamo-pvc` is properly configured and accessible
4. **Image pull issues**: Ensure the Docker image is accessible from the cluster
5. **Resource constraints**: Adjust resource limits if the job is being evicted
......@@ -500,9 +499,6 @@ kubectl describe pod <pod-name> -n $NAMESPACE
# Check PVC status
kubectl get pvc dynamo-pvc -n $NAMESPACE
# Verify service account
kubectl get sa dynamo-sa -n $NAMESPACE
# Check service endpoints
kubectl get svc -n $NAMESPACE
......
......@@ -217,7 +217,7 @@ If you see `ErrImagePull` or `ImagePullBackOff` errors with 401 unauthorized mes
2. Verify the service account was created with the image pull secret:
```bash
kubectl get serviceaccount dynamo-sa -n $NAMESPACE -o yaml
kubectl get serviceaccount dgdr-profiling-job -n $NAMESPACE -o yaml
```
3. The service account should show `imagePullSecrets` containing `nvcr-imagepullsecret`.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment