Unverified Commit 5528f3b4 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

fix: update planner + docs now that profiling results are stored in /data (#4098)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
parent 83a3fe4e
...@@ -92,25 +92,38 @@ python3 -m deploy.utils.inject_manifest \ ...@@ -92,25 +92,38 @@ python3 -m deploy.utils.inject_manifest \
--dest /data/configs/disagg.yaml --dest /data/configs/disagg.yaml
``` ```
**Download benchmark/profiling results:** **Download benchmark results:**
```bash ```bash
# After benchmarking or profiling completes, download results # After benchmarking completes, download results
python3 -m deploy.utils.download_pvc_results \ python3 -m deploy.utils.download_pvc_results \
--namespace $NAMESPACE \ --namespace $NAMESPACE \
--output-dir ./pvc_files \ --output-dir ./benchmarks/results \
--folder /data/results \ --folder /data/results \
--no-config # optional: skip *.yaml/*.yml in the download --no-config # optional: skip *.yaml/*.yml in the download
``` ```
**Download profiling results (optional, for local inspection):**
```bash
# Optional: Download profiling data for local analysis
# The planner reads directly from the PVC, so this is only needed for inspection
python3 -m deploy.utils.download_pvc_results \
--namespace $NAMESPACE \
--output-dir ./profiling_data \
--folder /data
```
> **Note on Profiling Results**: When using DGDR (DynamoGraphDeploymentRequest) for SLA-driven profiling, profiling data is stored in `/data/` on the PVC. The planner component reads this data directly from the PVC, so downloading is **optional** - only needed if you want to inspect the profiling results locally (e.g., view performance plots, check configurations).
#### Path Requirements #### Path Requirements
**Important**: The PVC is mounted at `/data` in the access pod for security reasons. All destination paths must start with `/data/`. **Important**: The PVC is mounted at `/data` in the access pod for security reasons. All destination paths must start with `/data/`.
**Common path patterns:** **Common path patterns:**
- `/data/configs/` - Configuration files (DGD manifests) - `/data/configs/` - Configuration files (DGD manifests)
- `/data/results/` - Benchmark results - `/data/results/` - Benchmark results (for download after benchmarking jobs)
- `/data/profiling_results/` - Profiling data - `/data/` - Profiling data (used directly by planner, typically not downloaded)
- `/data/benchmarking/` - Benchmarking artifacts - `/data/benchmarking/` - Benchmarking artifacts
**User-friendly error messages**: If you forget the `/data/` prefix, the script will show a helpful error message with the correct path and example commands. **User-friendly error messages**: If you forget the `/data/` prefix, the script will show a helpful error message with the correct path and example commands.
......
...@@ -182,7 +182,7 @@ def main(): ...@@ -182,7 +182,7 @@ def main():
parser.add_argument( parser.add_argument(
"--folder", "--folder",
required=True, required=True,
help="Absolute folder path in the PVC to download, must start with /data/, e.g. /data/profiling_results or /data/benchmarking_results", help="Absolute folder path in the PVC to download, must start with /data",
) )
args = parser.parse_args() args = parser.parse_args()
...@@ -192,10 +192,6 @@ def main(): ...@@ -192,10 +192,6 @@ def main():
print("❌ Error: Folder path must start with '/data/'") print("❌ Error: Folder path must start with '/data/'")
print(f" Provided: {args.folder}") print(f" Provided: {args.folder}")
print(" Quick Fix: Add '/data/' prefix to your path") print(" Quick Fix: Add '/data/' prefix to your path")
print(" Examples:")
print(" /profiling_results → /data/profiling_results")
print(" /benchmarking_results → /data/benchmarking_results")
print(" /configs → /data/configs")
sys.exit(1) sys.exit(1)
print("📥 PVC Results Download") print("📥 PVC Results Download")
......
...@@ -134,7 +134,6 @@ def main(): ...@@ -134,7 +134,6 @@ def main():
print("🔍 Common patterns:") print("🔍 Common patterns:")
print(" /configs/file.yaml → /data/configs/file.yaml") print(" /configs/file.yaml → /data/configs/file.yaml")
print(" /results/data.yaml → /data/results/data.yaml") print(" /results/data.yaml → /data/results/data.yaml")
print(" /profiling_results/... → /data/profiling_results/...")
print("=" * 60) print("=" * 60)
sys.exit(1) sys.exit(1)
......
...@@ -345,14 +345,18 @@ DGDRs are **immutable** - if you need to update SLAs or configuration: ...@@ -345,14 +345,18 @@ DGDRs are **immutable** - if you need to update SLAs or configuration:
### Manual Deployment Control ### Manual Deployment Control
Disable auto-deployment to review configurations before deploying: There are two ways to manually control deployment after profiling:
#### Option 1: Use DGDR-Generated Configuration (Recommended)
Disable auto-deployment to review the generated DGD before applying:
```yaml ```yaml
spec: spec:
autoApply: false autoApply: false
``` ```
Then manually apply the generated DGD: Then manually extract and apply the generated DGD:
```bash ```bash
# Extract generated config # Extract generated config
...@@ -365,6 +369,27 @@ vi my-dgd.yaml ...@@ -365,6 +369,27 @@ vi my-dgd.yaml
kubectl apply -f my-dgd.yaml -n $NAMESPACE kubectl apply -f my-dgd.yaml -n $NAMESPACE
``` ```
The generated DGD includes optimized configurations and the SLA planner component.
#### Option 2: Use Standalone Planner Templates (Advanced)
For advanced use cases, you can manually deploy using the standalone planner templates in `examples/backends/*/deploy/disagg_planner.yaml`:
```bash
# After profiling completes, profiling data is stored on the PVC at /data
# Optional: Download profiling results for local inspection
python3 -m deploy.utils.download_pvc_results \
--namespace $NAMESPACE \
--output-dir ./profiling_data \
--folder /data
# Update backend planner manifest as needed, then deploy
kubectl apply -f examples/backends/<backend>/deploy/disagg_planner.yaml -n $NAMESPACE
```
> **Note**: The standalone templates are provided as examples and may need customization for your model and requirements. The DGDR-generated configuration (Option 1) is recommended as it's automatically tuned to your profiling results and SLA targets.
### Relationship to DynamoGraphDeployment (DGD) ### Relationship to DynamoGraphDeployment (DGD)
- **DGDR**: High-level "intent" - what you want deployed - **DGDR**: High-level "intent" - what you want deployed
......
...@@ -37,7 +37,7 @@ spec: ...@@ -37,7 +37,7 @@ spec:
- --environment=kubernetes - --environment=kubernetes
- --backend=sglang - --backend=sglang
- --adjustment-interval=60 - --adjustment-interval=60
- --profile-results-dir=/data/profiling_results - --profile-results-dir=/data
decode: decode:
dynamoNamespace: dynamo dynamoNamespace: dynamo
envFromSecret: hf-token-secret envFromSecret: hf-token-secret
......
...@@ -57,7 +57,7 @@ spec: ...@@ -57,7 +57,7 @@ spec:
- --environment=kubernetes - --environment=kubernetes
- --backend=trtllm - --backend=trtllm
- --adjustment-interval=60 - --adjustment-interval=60
- --profile-results-dir=/data/profiling_results - --profile-results-dir=/data
- --prometheus-port=9085 - --prometheus-port=9085
TRTLLMDecodeWorker: TRTLLMDecodeWorker:
dynamoNamespace: trtllm-disagg-planner dynamoNamespace: trtllm-disagg-planner
......
...@@ -36,7 +36,7 @@ spec: ...@@ -36,7 +36,7 @@ spec:
- --environment=kubernetes - --environment=kubernetes
- --backend=vllm - --backend=vllm
- --adjustment-interval=60 - --adjustment-interval=60
- --profile-results-dir=/data/profiling_results - --profile-results-dir=/data
VllmDecodeWorker: VllmDecodeWorker:
dynamoNamespace: vllm-disagg-planner dynamoNamespace: vllm-disagg-planner
envFromSecret: hf-token-secret envFromSecret: hf-token-secret
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment