Unverified Commit 0e623146 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat: remove scripts for manipulating pvcs (#4206)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
parent 94fa72ca
...@@ -19,9 +19,7 @@ This includes: ...@@ -19,9 +19,7 @@ This includes:
- `manifests/` - `manifests/`
- `pvc.yaml` — PVC `dynamo-pvc` for storing profiler results and configurations - `pvc.yaml` — PVC `dynamo-pvc` for storing profiler results and configurations
- `pvc-access-pod.yaml` — short‑lived pod for copying profiler results from the PVC - `pvc-access-pod.yaml` — short‑lived pod for copying profiler results from the PVC
- `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC downloads) - `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC access)
- `inject_manifest.py` — utility for injecting deployment configurations into the PVC for profiling
- `download_pvc_results.py` — utility for downloading benchmark/profiling results from the PVC
- `dynamo_deployment.py` — utilities for working with DynamoGraphDeployment resources - `dynamo_deployment.py` — utilities for working with DynamoGraphDeployment resources
- `requirements.txt` — Python dependencies for benchmarking utilities - `requirements.txt` — Python dependencies for benchmarking utilities
...@@ -70,37 +68,44 @@ After running the setup script, verify the resources by checking: ...@@ -70,37 +68,44 @@ After running the setup script, verify the resources by checking:
kubectl get pvc dynamo-pvc -n $NAMESPACE kubectl get pvc dynamo-pvc -n $NAMESPACE
``` ```
### PVC Manipulation Scripts ### Working with the PVC
These scripts interact with the Persistent Volume Claim (PVC) that stores configuration files and benchmark/profiling results. They're essential for the Dynamo benchmarking and profiling workflows. The Persistent Volume Claim (PVC) stores configuration files and benchmark/profiling results. Use `kubectl cp` to copy files to and from the PVC.
#### Why These Scripts Are Needed #### Setting Up PVC Access
1. **For Pre-Deployment Profiling**: The profiling job needs access to your Dynamo deployment configurations (DGD manifests) to test different parallelization strategies First, create a temporary access pod to interact with the PVC:
2. **For Retrieving Results**: Both benchmarking and profiling jobs write their results to the PVC, which you need to download for analysis
#### Script Usage ```bash
# Create access pod
kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE
# Wait for pod to be ready
kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
```
**Inject deployment configurations for profiling:** #### Copying Files to the PVC
**Copy deployment configurations for profiling:**
```bash ```bash
# The profiling job reads your DGD config from the PVC # Copy a single file
# IMPORTANT: All paths must start with /data/ for security reasons kubectl cp ./my-disagg.yaml $NAMESPACE/pvc-access-pod:/data/configs/disagg.yaml
python3 -m deploy.utils.inject_manifest \
--namespace $NAMESPACE \ # Copy an entire directory
--src ./my-disagg.yaml \ kubectl cp ./configs/ $NAMESPACE/pvc-access-pod:/data/configs/
--dest /data/configs/disagg.yaml
``` ```
#### Downloading Files from the PVC
**Download benchmark results:** **Download benchmark results:**
```bash ```bash
# After benchmarking completes, download results # Download entire results directory
python3 -m deploy.utils.download_pvc_results \ kubectl cp $NAMESPACE/pvc-access-pod:/data/results ./benchmarks/results
--namespace $NAMESPACE \
--output-dir ./benchmarks/results \ # Download a specific subdirectory
--folder /data/results \ kubectl cp $NAMESPACE/pvc-access-pod:/data/results/benchmark-name ./benchmarks/results/benchmark-name
--no-config # optional: skip *.yaml/*.yml in the download
``` ```
**Download profiling results (optional, for local inspection):** **Download profiling results (optional, for local inspection):**
...@@ -108,26 +113,27 @@ python3 -m deploy.utils.download_pvc_results \ ...@@ -108,26 +113,27 @@ python3 -m deploy.utils.download_pvc_results \
```bash ```bash
# Optional: Download profiling data for local analysis # Optional: Download profiling data for local analysis
# The planner reads directly from the PVC, so this is only needed for inspection # The planner reads directly from the PVC, so this is only needed for inspection
python3 -m deploy.utils.download_pvc_results \ kubectl cp $NAMESPACE/pvc-access-pod:/data ./profiling_data
--namespace $NAMESPACE \
--output-dir ./profiling_data \
--folder /data
``` ```
> **Note on Profiling Results**: When using DGDR (DynamoGraphDeploymentRequest) for SLA-driven profiling, profiling data is stored in `/data/` on the PVC. The planner component reads this data directly from the PVC, so downloading is **optional** - only needed if you want to inspect the profiling results locally (e.g., view performance plots, check configurations). > **Note on Profiling Results**: When using DGDR (DynamoGraphDeploymentRequest) for SLA-driven profiling, profiling data is stored in `/data/` on the PVC. The planner component reads this data directly from the PVC, so downloading is **optional** - only needed if you want to inspect the profiling results locally (e.g., view performance plots, check configurations).
#### Path Requirements #### Cleanup Access Pod
When finished, delete the access pod:
```bash
kubectl delete pod pvc-access-pod -n $NAMESPACE
```
**Important**: The PVC is mounted at `/data` in the access pod for security reasons. All destination paths must start with `/data/`. #### Path Structure
**Common path patterns:** **Common path patterns in the PVC:**
- `/data/configs/` - Configuration files (DGD manifests) - `/data/configs/` - Configuration files (DGD manifests)
- `/data/results/` - Benchmark results (for download after benchmarking jobs) - `/data/results/` - Benchmark results (for download after benchmarking jobs)
- `/data/` - Profiling data (used directly by planner, typically not downloaded) - `/data/` - Profiling data (used directly by planner, typically not downloaded)
- `/data/benchmarking/` - Benchmarking artifacts - `/data/benchmarking/` - Benchmarking artifacts
**User-friendly error messages**: If you forget the `/data/` prefix, the script will show a helpful error message with the correct path and example commands.
#### Next Steps #### Next Steps
For complete benchmarking and profiling workflows: For complete benchmarking and profiling workflows:
......
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
PVC Results Download Script (generic)
Downloads files from a specified folder path inside a Kubernetes PVC into a local directory.
Creates an access pod, copies files, and exits. You can optionally exclude YAML configs.
Usage:
python3 download_pvc_results.py --namespace <namespace> --output-dir <local_directory> \
--folder /data/<folder/in/pvc> [--no-config]
"""
import argparse
import subprocess
import sys
from pathlib import Path
from typing import List
try:
from deploy.utils.kubernetes import (
check_kubectl_access,
cleanup_access_pod,
ensure_clean_access_pod,
run_command,
)
except ModuleNotFoundError:
# Allow running as a script: add repo root to sys.path
repo_root = Path(__file__).resolve().parents[2]
sys.path.insert(0, str(repo_root))
from deploy.utils.kubernetes import (
check_kubectl_access,
cleanup_access_pod,
ensure_clean_access_pod,
run_command,
)
def list_pvc_contents(
namespace: str, pod_name: str, base_folder: str, skip_config: bool = False
) -> List[str]:
"""List contents of the PVC to identify files.
Downloads all files under base_folder. If skip_config is True, excludes *.yaml and *.yml.
"""
print("Scanning PVC contents...")
# Build find command: all files
find_cmd = [
"kubectl",
"exec",
pod_name,
"-n",
namespace,
"--",
"find",
base_folder,
"-type",
"f",
]
# Exclude YAML files when requested
if skip_config:
find_cmd.extend(["-not", "-name", "*.yaml", "-not", "-name", "*.yml"])
try:
result = run_command(find_cmd, capture_output=True)
files = [f.strip() for f in result.stdout.split("\n") if f.strip()]
config_note = " (excluding config files)" if skip_config else ""
print(f"Found {len(files)} files to download{config_note}")
return files
except subprocess.CalledProcessError:
print("ERROR: Failed to list PVC contents")
sys.exit(1)
def download_files(
namespace: str, pod_name: str, files: List[str], output_dir: Path, base_folder: str
) -> None:
"""Download relevant files from PVC to local directory."""
if not files:
print("No files to download")
return
# Create output directory
output_dir.mkdir(parents=True, exist_ok=True)
print(f"Downloading {len(files)} files to {output_dir}")
downloaded = 0
failed = 0
for file_path in files:
try:
# Determine relative path and create local structure based on base_folder
prefix = base_folder.rstrip("/") + "/"
rel_path = (
file_path[len(prefix) :]
if file_path.startswith(prefix)
else file_path.lstrip("/")
)
# Validate relative path
if ".." in rel_path or rel_path.startswith("/"):
print(f" WARNING: Skipping potentially unsafe path: {file_path}")
failed += 1
continue
local_file = output_dir / rel_path
# Ensure the file is within output_dir
if not local_file.resolve().is_relative_to(output_dir.resolve()):
print(f" WARNING: Skipping file outside output directory: {file_path}")
failed += 1
continue
local_file.parent.mkdir(parents=True, exist_ok=True)
# Download file
run_command(
[
"kubectl",
"cp",
f"{namespace}/{pod_name}:{file_path}",
str(local_file),
],
capture_output=True,
)
downloaded += 1
if downloaded % 5 == 0: # Progress update every 5 files
print(f" Downloaded {downloaded}/{len(files)} files...")
except subprocess.CalledProcessError as e:
print(f" WARNING: Failed to download {file_path}: {e}")
failed += 1
print(f"✓ Download completed: {downloaded} successful, {failed} failed")
def main():
parser = argparse.ArgumentParser(
description="Download profiling results from PVC to local directory",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
parser.add_argument(
"--namespace",
"-n",
required=True,
help="Kubernetes namespace containing the profiling PVC",
)
parser.add_argument(
"--output-dir",
"-o",
type=Path,
required=True,
help="Local directory to download results to",
)
parser.add_argument(
"--no-config",
action="store_true",
help="Skip downloading configuration files (*.yaml, *.yml)",
)
parser.add_argument(
"--folder",
required=True,
help="Absolute folder path in the PVC to download, must start with /data",
)
args = parser.parse_args()
# Validate folder path starts with /data/
if not args.folder.startswith("/data/"):
print("❌ Error: Folder path must start with '/data/'")
print(f" Provided: {args.folder}")
print(" Quick Fix: Add '/data/' prefix to your path")
sys.exit(1)
print("📥 PVC Results Download")
print("=" * 40)
# Validate inputs
check_kubectl_access(args.namespace)
# Deploy access pod
pod_name = ensure_clean_access_pod(args.namespace)
try:
# List and download files
files = list_pvc_contents(args.namespace, pod_name, args.folder, args.no_config)
download_files(args.namespace, pod_name, files, args.output_dir, args.folder)
finally:
# Cleanup
cleanup_access_pod(args.namespace)
print("\n✅ Download completed!")
print(f"📁 Results available at: {args.output_dir.absolute()}")
print("📄 See README.md for file descriptions")
if __name__ == "__main__":
main()
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Manifest Injection Script
Copies any Kubernetes manifest file into the PVC for later use by jobs.
Both the source manifest path and destination path in the PVC are required.
IMPORTANT: The PVC is mounted at /data in the access pod for security reasons.
All destination paths must start with '/data/'.
Usage:
python3 inject_manifest.py --namespace <namespace> --src <local_manifest.yaml> --dest <absolute_path_in_pvc>
Examples:
python3 inject_manifest.py --namespace <ns> --src ./disagg.yaml --dest /data/configs/disagg.yaml
python3 inject_manifest.py --namespace <ns> --src ./my-data.yaml --dest /data/custom/path/data.yaml
"""
import argparse
import sys
from pathlib import Path
from deploy.utils.kubernetes import (
PVC_ACCESS_POD_NAME,
check_kubectl_access,
cleanup_access_pod,
ensure_clean_access_pod,
run_command,
)
def copy_manifest(namespace: str, manifest_path: Path, target_path: str) -> None:
"""Copy a manifest file into the PVC via the access pod."""
pod_name = PVC_ACCESS_POD_NAME
if not manifest_path.exists():
print(f"ERROR: Manifest file not found: {manifest_path}")
sys.exit(1)
print(f"Copying {manifest_path} to {target_path} in PVC...")
# Ensure destination directory exists
target_dir = str(Path(target_path).parent)
run_command(
["kubectl", "exec", pod_name, "-n", namespace, "--", "mkdir", "-p", target_dir],
capture_output=False,
)
# Copy file to pod
run_command(
[
"kubectl",
"cp",
str(manifest_path),
f"{namespace}/{pod_name}:{target_path}",
],
capture_output=False,
)
# Verify the file was copied
result = run_command(
["kubectl", "exec", pod_name, "-n", namespace, "--", "ls", "-la", target_path],
capture_output=True,
)
print("✓ Manifest successfully copied to PVC")
print(f"File details: {result.stdout.strip()}")
def main():
parser = argparse.ArgumentParser(
description="Inject a Kubernetes manifest into the PVC",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
parser.add_argument(
"--namespace",
"-n",
required=True,
help="Kubernetes namespace containing the profiling PVC",
)
parser.add_argument(
"--src", required=True, type=Path, help="Path to manifest file to copy"
)
parser.add_argument(
"--dest",
required=True,
help="Absolute target path in PVC (must start with /data/, e.g., /data/configs/agg.yaml)",
)
args = parser.parse_args()
# Validate target_path to prevent directory traversal and ensure it's within PVC
if not args.dest.startswith("/data/"):
print("=" * 60)
print("❌ ERROR: Invalid target path")
print("=" * 60)
print("The PVC is mounted at /data in the access pod.")
print("All paths must start with '/data/' for security reasons.")
print("")
print("💡 QUICK FIX:")
if args.dest.startswith("/"):
# Suggest the fix
suggested_path = f"/data{args.dest}"
print(f" Change: {args.dest}")
print(f" To: {suggested_path}")
print("")
print("📝 Example commands:")
print(" python3 -m deploy.utils.inject_manifest \\")
print(f" --namespace {args.namespace} \\")
print(f" --src {args.src} \\")
print(f" --dest {suggested_path}")
else:
print(f" Use: /data/{args.dest.lstrip('/')}")
print("")
print("🔍 Common patterns:")
print(" /configs/file.yaml → /data/configs/file.yaml")
print(" /results/data.yaml → /data/results/data.yaml")
print("=" * 60)
sys.exit(1)
if ".." in args.dest:
print("ERROR: Target path cannot contain '..'")
sys.exit(1)
print("🚀 Manifest Injection")
print("=" * 40)
# Validate inputs
check_kubectl_access(args.namespace)
# Deploy access pod
ensure_clean_access_pod(args.namespace)
try:
# Copy manifest
copy_manifest(args.namespace, args.src, args.dest)
print("\n✅ Manifest injection completed!")
print(f"📁 File available at: {args.dest}")
finally:
# Cleanup even on failure
cleanup_access_pod(args.namespace)
if __name__ == "__main__":
main()
...@@ -70,9 +70,9 @@ log "Applying benchmarking manifests to namespace $NAMESPACE" ...@@ -70,9 +70,9 @@ log "Applying benchmarking manifests to namespace $NAMESPACE"
export NAMESPACE # ensure envsubst can see it export NAMESPACE # ensure envsubst can see it
for mf in "$(dirname "$0")/manifests"/*.yaml; do for mf in "$(dirname "$0")/manifests"/*.yaml; do
if [[ -f "$mf" ]]; then if [[ -f "$mf" ]]; then
# Skip pvc-access-pod.yaml as it's managed by inject_manifest.py # Skip pvc-access-pod.yaml as it's created on-demand by users
if [[ "$(basename "$mf")" == "pvc-access-pod.yaml" ]]; then if [[ "$(basename "$mf")" == "pvc-access-pod.yaml" ]]; then
log "Skipping $mf (managed by inject_manifest.py)" log "Skipping $mf (created on-demand when accessing PVC)"
continue continue
fi fi
......
...@@ -364,12 +364,15 @@ kubectl apply -f benchmarks/incluster/benchmark_job.yaml -n $NAMESPACE ...@@ -364,12 +364,15 @@ kubectl apply -f benchmarks/incluster/benchmark_job.yaml -n $NAMESPACE
### Step 3: Retrieve Results ### Step 3: Retrieve Results
```bash ```bash
# Download results from PVC (recommended) # Create access pod (skip this step if access pod is already running)
python3 -m deploy.utils.download_pvc_results \ kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE
--namespace $NAMESPACE \ kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
--output-dir ./benchmarks/results/<benchmark-name> \
--folder /data/results/<benchmark-name> \ # Download the results
--no-config kubectl cp $NAMESPACE/pvc-access-pod:/data/results/<benchmark-name> ./benchmarks/results/<benchmark-name>
# Cleanup
kubectl delete pod pvc-access-pod -n $NAMESPACE
``` ```
### Step 4: Generate Plots ### Step 4: Generate Plots
......
...@@ -379,11 +379,16 @@ For advanced use cases, you can manually deploy using the standalone planner tem ...@@ -379,11 +379,16 @@ For advanced use cases, you can manually deploy using the standalone planner tem
```bash ```bash
# After profiling completes, profiling data is stored on the PVC at /data # After profiling completes, profiling data is stored on the PVC at /data
# Optional: Download profiling results for local inspection # OPTIONAL: Download profiling results for local inspection
python3 -m deploy.utils.download_pvc_results \ # Create access pod (skip this step if access pod is already running)
--namespace $NAMESPACE \ kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE
--output-dir ./profiling_data \ kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
--folder /data
# Download the data
kubectl cp $NAMESPACE/pvc-access-pod:/data ./profiling_data
# Cleanup
kubectl delete pod pvc-access-pod -n $NAMESPACE
# Update backend planner manifest as needed, then deploy # Update backend planner manifest as needed, then deploy
kubectl apply -f examples/backends/<backend>/deploy/disagg_planner.yaml -n $NAMESPACE kubectl apply -f examples/backends/<backend>/deploy/disagg_planner.yaml -n $NAMESPACE
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment