"vscode:/vscode.git/clone" did not exist on "8a6efec6bd233f009e6cff8f64ac3c8b8274f565"
README.md 5.15 KB
Newer Older
1
# Kubernetes utilities for Dynamo Benchmarking and Profiling
2

3
4
5
6
This directory contains utilities and manifests for Dynamo benchmarking and profiling workflows.

## Prerequisites

7
**Before using these utilities, you must first set up Dynamo Kubernetes Platform following the main installation guide:**
8

9
👉 **[Follow the Dynamo Kubernetes Platform installation guide](/docs/kubernetes/installation-guide.md) to install the Dynamo Kubernetes Platform first.**
10
11
12
13
14

This includes:
1. Installing the Dynamo CRDs
2. Installing the Dynamo Platform (operator, etcd, NATS)
3. Setting up your target namespace
15
16
17

## Contents

18
- `setup_benchmarking_resources.sh` — Sets up benchmarking and profiling resources in your existing Dynamo namespace
19
- `manifests/`
20
  - `pvc.yaml` — PVC `dynamo-pvc`
21
  - `pvc-access-pod.yaml` — short‑lived pod for copying profiler results from the PVC
22
- `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC access)
23
24
- `dynamo_deployment.py` — utilities for working with DynamoGraphDeployment resources
- `requirements.txt` — Python dependencies for benchmarking utilities
25
26
27

## Quick start

28
### Benchmarking Resource Setup
29

30
After setting up Dynamo Kubernetes Platform, use this script to prepare your namespace with the additional resources needed for benchmarking and profiling workflows:
31

32
33
34
The setup script creates a `dynamo-pvc` with `ReadWriteOnce` (RWO) access mode using your cluster's default storage class. This is sufficient for profiling workflows where only one job writes at a time.

If you want to use `ReadWriteMany` (RWX) for concurrent access, modify `deploy/utils/manifests/pvc.yaml` before running the script:
35
36
37
38
39

```yaml
spec:
  accessModes:
  - ReadWriteMany
40
41
42
43
  storageClassName: <your-rwx-capable-storageclass>  # e.g., NFS-based storage
  resources:
    requests:
      storage: 50Gi
44
45
46
47
48
49
50
51
52
53
```

> [!TIP]
> **Check your clusters storage classes**
>
> - List storage classes and provisioners:
> ```bash
> kubectl get sc -o wide
> ```

54
```bash
55
56
export NAMESPACE=your-dynamo-namespace
export HF_TOKEN=<HF_TOKEN>  # Optional: for HuggingFace model access
57

58
deploy/utils/setup_benchmarking_resources.sh
59
60
```

61
This script applies the following manifests to your existing Dynamo namespace:
62
63
64

- `deploy/utils/manifests/pvc.yaml` - PVC `dynamo-pvc`

65
If `HF_TOKEN` is provided, it also creates a secret for HuggingFace model access.
66

67
After running the setup script, verify the resources by checking:
68
69

```bash
70
kubectl get pvc dynamo-pvc -n $NAMESPACE
71
72
```

73
### Working with the PVC
74

75
The Persistent Volume Claim (PVC) stores configuration files and benchmark/profiling results. Use `kubectl cp` to copy files to and from the PVC.
76

77
#### Setting Up PVC Access
78

79
First, create a temporary access pod to interact with the PVC:
80

81
82
83
84
85
86
87
```bash
# Create access pod
kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE

# Wait for pod to be ready
kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
```
88

89
90
91
#### Copying Files to the PVC

**Copy deployment configurations for profiling:**
92
93

```bash
94
95
96
97
98
# Copy a single file
kubectl cp ./my-disagg.yaml $NAMESPACE/pvc-access-pod:/data/configs/disagg.yaml

# Copy an entire directory
kubectl cp ./configs/ $NAMESPACE/pvc-access-pod:/data/configs/
99
100
```

101
102
#### Downloading Files from the PVC

103
**Download benchmark results:**
104
105

```bash
106
107
108
109
110
# Download entire results directory
kubectl cp $NAMESPACE/pvc-access-pod:/data/results ./benchmarks/results

# Download a specific subdirectory
kubectl cp $NAMESPACE/pvc-access-pod:/data/results/benchmark-name ./benchmarks/results/benchmark-name
111
112
```

113
**Inspect profiling results (optional, for local inspection):**
114
115

```bash
116
117
118
119
120
# View the generated DGD configuration from profiling
kubectl get configmap dgdr-output-<dgdr-name> -n $NAMESPACE -o yaml

# View the planner profiling data (JSON format)
kubectl get configmap planner-profile-data -n $NAMESPACE -o yaml
121
122
```

123
124
125
126
127
> **Note on Profiling Results**: When using DGDR (DynamoGraphDeploymentRequest) for SLA-driven profiling, profiling data is automatically stored in ConfigMaps:
> - `dgdr-output-<dgdr-name>`: Contains the generated DynamoGraphDeployment YAML
> - `planner-profile-data`: Contains profiling performance data in JSON format for the planner
>
> The planner component reads this data directly from the mounted ConfigMap, so no PVC is needed.
128

129
130
131
132
133
134
135
#### Cleanup Access Pod

When finished, delete the access pod:

```bash
kubectl delete pod pvc-access-pod -n $NAMESPACE
```
136

137
#### Path Structure
138

139
**Common path patterns in the PVC:**
140
- `/data/configs/` - Configuration files (DGD manifests)
141
- `/data/results/` - Benchmark results (for download after benchmarking jobs)
142
143
- `/data/benchmarking/` - Benchmarking artifacts

144
145
#### Next Steps

146
For complete benchmarking and profiling workflows:
147
148
- **Benchmarking Guide**: See [docs/benchmarks/benchmarking.md](../../docs/benchmarks/benchmarking.md) for comparing DynamoGraphDeployments and external endpoints
- **Pre-Deployment Profiling**: See [docs/components/profiler/profiler-guide.md](../../docs/components/profiler/profiler-guide.md) for optimizing configurations before deployment
149
150
151

## Notes

152
- This setup is focused on benchmarking and profiling resources only - the main Dynamo platform must be installed separately.