feat: remove scripts for manipulating pvcs (#4206)

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat: remove scripts for manipulating pvcs (#4206)
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
0e623146 · hhzhang16 · GitHub · 94fa72ca · 0e623146 · 94fa72ca
Unverified Commit 0e623146 authored Nov 10, 2025 by hhzhang16 Committed by GitHub Nov 10, 2025
6 changed files
--- a/deploy/utils/README.md
+++ b/deploy/utils/README.md
@@ -19,9 +19,7 @@ This includes:
 - `manifests/`
  - `pvc.yaml` — PVC `dynamo-pvc` for storing profiler results and configurations
  - `pvc-access-pod.yaml` — short‑lived pod for copying profiler results from the PVC
- `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC downloads)
+- `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC access)
- `inject_manifest.py` — utility for injecting deployment configurations into the PVC for profiling
- `download_pvc_results.py` — utility for downloading benchmark/profiling results from the PVC
 - `dynamo_deployment.py` — utilities for working with DynamoGraphDeployment resources
 - `requirements.txt` — Python dependencies for benchmarking utilities
@@ -70,37 +68,44 @@ After running the setup script, verify the resources by checking:
 kubectl get pvc dynamo-pvc -n $NAMESPACE
 ```
-### PVC Manipulation Scripts
+### Working with the PVC
-These scripts interact with the Persistent Volume Claim (PVC) that stores configuration files and benchmark/profiling results. They're essential for the Dynamo benchmarking and profiling workflows.
+The Persistent Volume Claim (PVC) stores configuration files and benchmark/profiling results. Use `kubectl cp` to copy files to and from the PVC.
-#### Why These Scripts Are Needed
+#### Setting Up PVC Access
-1. **For Pre-Deployment Profiling**: The profiling job needs access to your Dynamo deployment configurations (DGD manifests) to test different parallelization strategies
+First, create a temporary access pod to interact with the PVC:
-2. **For Retrieving Results**: Both benchmarking and profiling jobs write their results to the PVC, which you need to download for analysis
-#### Script Usage
+```bash
+# Create access pod
+kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE
+# Wait for pod to be ready
+kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
+```
-**Inject deployment configurations for profiling:**
+#### Copying Files to the PVC
+**Copy deployment configurations for profiling:**
 ```bash
-# The profiling job reads your DGD config from the PVC
+# Copy a single file
-# IMPORTANT: All paths must start with /data/ for security reasons
+kubectl cp ./my-disagg.yaml $NAMESPACE/pvc-access-pod:/data/configs/disagg.yaml
-python3 -m deploy.utils.inject_manifest \
-  --namespace $NAMESPACE \
+# Copy an entire directory
-  --src ./my-disagg.yaml \
+kubectl cp ./configs/ $NAMESPACE/pvc-access-pod:/data/configs/
-  --dest /data/configs/disagg.yaml
 ```
+#### Downloading Files from the PVC
 **Download benchmark results:**
 ```bash
-# After benchmarking completes, download results
+# Download entire results directory
-python3 -m deploy.utils.download_pvc_results \
+kubectl cp $NAMESPACE/pvc-access-pod:/data/results ./benchmarks/results
-  --namespace $NAMESPACE \
-  --output-dir ./benchmarks/results \
+# Download a specific subdirectory
-  --folder /data/results \
+kubectl cp $NAMESPACE/pvc-access-pod:/data/results/benchmark-name ./benchmarks/results/benchmark-name
-  --no-config   # optional: skip *.yaml/*.yml in the download
 ```
 **Download profiling results (optional, for local inspection):**
@@ -108,26 +113,27 @@ python3 -m deploy.utils.download_pvc_results \
 ```bash
 # Optional: Download profiling data for local analysis
 # The planner reads directly from the PVC, so this is only needed for inspection
-python3 -m deploy.utils.download_pvc_results \
+kubectl cp $NAMESPACE/pvc-access-pod:/data ./profiling_data
-  --namespace $NAMESPACE \
-  --output-dir ./profiling_data \
-  --folder /data
 ```
 > **Note on Profiling Results**: When using DGDR (DynamoGraphDeploymentRequest) for SLA-driven profiling, profiling data is stored in `/data/` on the PVC. The planner component reads this data directly from the PVC, so downloading is **optional** - only needed if you want to inspect the profiling results locally (e.g., view performance plots, check configurations).
-#### Path Requirements
+#### Cleanup Access Pod
+When finished, delete the access pod:
+```bash
+kubectl delete pod pvc-access-pod -n $NAMESPACE
+```
-**Important**: The PVC is mounted at `/data` in the access pod for security reasons. All destination paths must start with `/data/`.
+#### Path Structure
-**Common path patterns:**
+**Common path patterns in the PVC:**
 - `/data/configs/` - Configuration files (DGD manifests)
 - `/data/results/` - Benchmark results (for download after benchmarking jobs)
 - `/data/` - Profiling data (used directly by planner, typically not downloaded)
 - `/data/benchmarking/` - Benchmarking artifacts
-**User-friendly error messages**: If you forget the `/data/` prefix, the script will show a helpful error message with the correct path and example commands.
 #### Next Steps
 For complete benchmarking and profiling workflows:

--- a/deploy/utils/download_pvc_results.py
+++ b/deploy/utils/download_pvc_results.py
-#!/usr/bin/env python3
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""
-PVC Results Download Script (generic)
-Downloads files from a specified folder path inside a Kubernetes PVC into a local directory.
-Creates an access pod, copies files, and exits. You can optionally exclude YAML configs.
-Usage:
-    python3 download_pvc_results.py --namespace <namespace> --output-dir <local_directory> \
-        --folder /data/<folder/in/pvc> [--no-config]
-"""
-import argparse
-import subprocess
-import sys
-from pathlib import Path
-from typing import List
-try:
-    from deploy.utils.kubernetes import (
-        check_kubectl_access,
-        cleanup_access_pod,
-        ensure_clean_access_pod,
-        run_command,
-    )
-except ModuleNotFoundError:
-    # Allow running as a script: add repo root to sys.path
-    repo_root = Path(__file__).resolve().parents[2]
-    sys.path.insert(0, str(repo_root))
-    from deploy.utils.kubernetes import (
-        check_kubectl_access,
-        cleanup_access_pod,
-        ensure_clean_access_pod,
-        run_command,
-    )
-def list_pvc_contents(
-    namespace: str, pod_name: str, base_folder: str, skip_config: bool = False
-) -> List[str]:
-    """List contents of the PVC to identify files.
-    Downloads all files under base_folder. If skip_config is True, excludes *.yaml and *.yml.
-    """
-    print("Scanning PVC contents...")
-    # Build find command: all files
-    find_cmd = [
-        "kubectl",
-        "exec",
-        pod_name,
-        "-n",
-        namespace,
-        "--",
-        "find",
-        base_folder,
-        "-type",
-        "f",
-    ]
-    # Exclude YAML files when requested
-    if skip_config:
-        find_cmd.extend(["-not", "-name", "*.yaml", "-not", "-name", "*.yml"])
-    try:
-        result = run_command(find_cmd, capture_output=True)
-        files = [f.strip() for f in result.stdout.split("\n") if f.strip()]
-        config_note = " (excluding config files)" if skip_config else ""
-        print(f"Found {len(files)} files to download{config_note}")
-        return files
-    except subprocess.CalledProcessError:
-        print("ERROR: Failed to list PVC contents")
-        sys.exit(1)
-def download_files(
-    namespace: str, pod_name: str, files: List[str], output_dir: Path, base_folder: str
-) -> None:
-    """Download relevant files from PVC to local directory."""
-    if not files:
-        print("No files to download")
-        return
-    # Create output directory
-    output_dir.mkdir(parents=True, exist_ok=True)
-    print(f"Downloading {len(files)} files to {output_dir}")
-    downloaded = 0
-    failed = 0
-    for file_path in files:
-        try:
-            # Determine relative path and create local structure based on base_folder
-            prefix = base_folder.rstrip("/") + "/"
-            rel_path = (
-                file_path[len(prefix) :]
-                if file_path.startswith(prefix)
-                else file_path.lstrip("/")
-            )
-            # Validate relative path
-            if ".." in rel_path or rel_path.startswith("/"):
-                print(f"  WARNING: Skipping potentially unsafe path: {file_path}")
-                failed += 1
-                continue
-            local_file = output_dir / rel_path
-            # Ensure the file is within output_dir
-            if not local_file.resolve().is_relative_to(output_dir.resolve()):
-                print(f"  WARNING: Skipping file outside output directory: {file_path}")
-                failed += 1
-                continue
-            local_file.parent.mkdir(parents=True, exist_ok=True)
-            # Download file
-            run_command(
-                [
-                    "kubectl",
-                    "cp",
-                    f"{namespace}/{pod_name}:{file_path}",
-                    str(local_file),
-                ],
-                capture_output=True,
-            )
-            downloaded += 1
-            if downloaded % 5 == 0:  # Progress update every 5 files
-                print(f"  Downloaded {downloaded}/{len(files)} files...")
-        except subprocess.CalledProcessError as e:
-            print(f"  WARNING: Failed to download {file_path}: {e}")
-            failed += 1
-    print(f"✓ Download completed: {downloaded} successful, {failed} failed")
-def main():
-    parser = argparse.ArgumentParser(
-        description="Download profiling results from PVC to local directory",
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-        epilog=__doc__,
-    )
-    parser.add_argument(
-        "--namespace",
-        "-n",
-        required=True,
-        help="Kubernetes namespace containing the profiling PVC",
-    )
-    parser.add_argument(
-        "--output-dir",
-        "-o",
-        type=Path,
-        required=True,
-        help="Local directory to download results to",
-    )
-    parser.add_argument(
-        "--no-config",
-        action="store_true",
-        help="Skip downloading configuration files (*.yaml, *.yml)",
-    )
-    parser.add_argument(
-        "--folder",
-        required=True,
-        help="Absolute folder path in the PVC to download, must start with /data",
-    )
-    args = parser.parse_args()
-    # Validate folder path starts with /data/
-    if not args.folder.startswith("/data/"):
-        print("❌ Error: Folder path must start with '/data/'")
-        print(f"   Provided: {args.folder}")
-        print("   Quick Fix: Add '/data/' prefix to your path")
-        sys.exit(1)
-    print("📥 PVC Results Download")
-    print("=" * 40)
-    # Validate inputs
-    check_kubectl_access(args.namespace)
-    # Deploy access pod
-    pod_name = ensure_clean_access_pod(args.namespace)
-    try:
-        # List and download files
-        files = list_pvc_contents(args.namespace, pod_name, args.folder, args.no_config)
-        download_files(args.namespace, pod_name, files, args.output_dir, args.folder)
-    finally:
-        # Cleanup
-        cleanup_access_pod(args.namespace)
-    print("\n✅ Download completed!")
-    print(f"📁 Results available at: {args.output_dir.absolute()}")
-    print("📄 See README.md for file descriptions")
-if __name__ == "__main__":
-    main()
--- a/deploy/utils/inject_manifest.py
+++ b/deploy/utils/inject_manifest.py
-#!/usr/bin/env python3
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""
-Manifest Injection Script
-Copies any Kubernetes manifest file into the PVC for later use by jobs.
-Both the source manifest path and destination path in the PVC are required.
-IMPORTANT: The PVC is mounted at /data in the access pod for security reasons.
-All destination paths must start with '/data/'.
-Usage:
-    python3 inject_manifest.py --namespace <namespace> --src <local_manifest.yaml> --dest <absolute_path_in_pvc>
-Examples:
-    python3 inject_manifest.py --namespace <ns> --src ./disagg.yaml --dest /data/configs/disagg.yaml
-    python3 inject_manifest.py --namespace <ns> --src ./my-data.yaml    --dest /data/custom/path/data.yaml
-"""
-import argparse
-import sys
-from pathlib import Path
-from deploy.utils.kubernetes import (
-    PVC_ACCESS_POD_NAME,
-    check_kubectl_access,
-    cleanup_access_pod,
-    ensure_clean_access_pod,
-    run_command,
-)
-def copy_manifest(namespace: str, manifest_path: Path, target_path: str) -> None:
-    """Copy a manifest file into the PVC via the access pod."""
-    pod_name = PVC_ACCESS_POD_NAME
-    if not manifest_path.exists():
-        print(f"ERROR: Manifest file not found: {manifest_path}")
-        sys.exit(1)
-    print(f"Copying {manifest_path} to {target_path} in PVC...")
-    # Ensure destination directory exists
-    target_dir = str(Path(target_path).parent)
-    run_command(
-        ["kubectl", "exec", pod_name, "-n", namespace, "--", "mkdir", "-p", target_dir],
-        capture_output=False,
-    )
-    # Copy file to pod
-    run_command(
-        [
-            "kubectl",
-            "cp",
-            str(manifest_path),
-            f"{namespace}/{pod_name}:{target_path}",
-        ],
-        capture_output=False,
-    )
-    # Verify the file was copied
-    result = run_command(
-        ["kubectl", "exec", pod_name, "-n", namespace, "--", "ls", "-la", target_path],
-        capture_output=True,
-    )
-    print("✓ Manifest successfully copied to PVC")
-    print(f"File details: {result.stdout.strip()}")
-def main():
-    parser = argparse.ArgumentParser(
-        description="Inject a Kubernetes manifest into the PVC",
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-        epilog=__doc__,
-    )
-    parser.add_argument(
-        "--namespace",
-        "-n",
-        required=True,
-        help="Kubernetes namespace containing the profiling PVC",
-    )
-    parser.add_argument(
-        "--src", required=True, type=Path, help="Path to manifest file to copy"
-    )
-    parser.add_argument(
-        "--dest",
-        required=True,
-        help="Absolute target path in PVC (must start with /data/, e.g., /data/configs/agg.yaml)",
-    )
-    args = parser.parse_args()
-    # Validate target_path to prevent directory traversal and ensure it's within PVC
-    if not args.dest.startswith("/data/"):
-        print("=" * 60)
-        print("❌ ERROR: Invalid target path")
-        print("=" * 60)
-        print("The PVC is mounted at /data in the access pod.")
-        print("All paths must start with '/data/' for security reasons.")
-        print("")
-        print("💡 QUICK FIX:")
-        if args.dest.startswith("/"):
-            # Suggest the fix
-            suggested_path = f"/data{args.dest}"
-            print(f"  Change: {args.dest}")
-            print(f"  To:     {suggested_path}")
-            print("")
-            print("📝 Example commands:")
-            print("  python3 -m deploy.utils.inject_manifest \\")
-            print(f"    --namespace {args.namespace} \\")
-            print(f"    --src {args.src} \\")
-            print(f"    --dest {suggested_path}")
-        else:
-            print(f"  Use: /data/{args.dest.lstrip('/')}")
-        print("")
-        print("🔍 Common patterns:")
-        print("  /configs/file.yaml     → /data/configs/file.yaml")
-        print("  /results/data.yaml     → /data/results/data.yaml")
-        print("=" * 60)
-        sys.exit(1)
-    if ".." in args.dest:
-        print("ERROR: Target path cannot contain '..'")
-        sys.exit(1)
-    print("🚀 Manifest Injection")
-    print("=" * 40)
-    # Validate inputs
-    check_kubectl_access(args.namespace)
-    # Deploy access pod
-    ensure_clean_access_pod(args.namespace)
-    try:
-        # Copy manifest
-        copy_manifest(args.namespace, args.src, args.dest)
-        print("\n✅ Manifest injection completed!")
-        print(f"📁 File available at: {args.dest}")
-    finally:
-        # Cleanup even on failure
-        cleanup_access_pod(args.namespace)
-if __name__ == "__main__":
-    main()
--- a/deploy/utils/setup_benchmarking_resources.sh
+++ b/deploy/utils/setup_benchmarking_resources.sh
@@ -70,9 +70,9 @@ log "Applying benchmarking manifests to namespace $NAMESPACE"
 export NAMESPACE  # ensure envsubst can see it
 for mf in "$(dirname "$0")/manifests"/*.yaml; do
  if [[ -f "$mf" ]]; then
-    # Skip pvc-access-pod.yaml as it's managed by inject_manifest.py
+    # Skip pvc-access-pod.yaml as it's created on-demand by users
    if [[ "$(basename "$mf")" == "pvc-access-pod.yaml" ]]; then
-      log "Skipping $mf (managed by inject_manifest.py)"
+      log "Skipping $mf (created on-demand when accessing PVC)"
      continue
    fi

--- a/docs/benchmarks/benchmarking.md
+++ b/docs/benchmarks/benchmarking.md
@@ -364,12 +364,15 @@ kubectl apply -f benchmarks/incluster/benchmark_job.yaml -n $NAMESPACE
 ### Step 3: Retrieve Results
 ```bash
-# Download results from PVC (recommended)
+# Create access pod (skip this step if access pod is already running)
-python3 -m deploy.utils.download_pvc_results \
+kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE
-  --namespace $NAMESPACE \
+kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
-  --output-dir ./benchmarks/results/<benchmark-name> \
-  --folder /data/results/<benchmark-name> \
+# Download the results
-  --no-config
+kubectl cp $NAMESPACE/pvc-access-pod:/data/results/<benchmark-name> ./benchmarks/results/<benchmark-name>
+# Cleanup
+kubectl delete pod pvc-access-pod -n $NAMESPACE
 ```
 ### Step 4: Generate Plots

--- a/docs/planner/sla_planner_quickstart.md
+++ b/docs/planner/sla_planner_quickstart.md
@@ -379,11 +379,16 @@ For advanced use cases, you can manually deploy using the standalone planner tem
 ```bash
 # After profiling completes, profiling data is stored on the PVC at /data
-# Optional: Download profiling results for local inspection
+# OPTIONAL: Download profiling results for local inspection
-python3 -m deploy.utils.download_pvc_results \
+# Create access pod (skip this step if access pod is already running)
-  --namespace $NAMESPACE \
+kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE
-  --output-dir ./profiling_data \
+kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
-  --folder /data
+# Download the data
+kubectl cp $NAMESPACE/pvc-access-pod:/data ./profiling_data
+# Cleanup
+kubectl delete pod pvc-access-pod -n $NAMESPACE
 # Update backend planner manifest as needed, then deploy
 kubectl apply -f examples/backends/<backend>/deploy/disagg_planner.yaml -n $NAMESPACE