README.md 5.01 KB
Newer Older
1
# Kubernetes utilities for Dynamo Benchmarking and Profiling
2

3
4
5
6
7
8
This directory contains utilities and manifests for Dynamo benchmarking and profiling workflows.

## Prerequisites

**Before using these utilities, you must first set up Dynamo Cloud following the main installation guide:**

9
👉 **[Follow the Dynamo Cloud installation guide](/docs/kubernetes/installation_guide.md) to install the Dynamo Kubernetes Platform first.**
10
11
12
13
14

This includes:
1. Installing the Dynamo CRDs
2. Installing the Dynamo Platform (operator, etcd, NATS)
3. Setting up your target namespace
15
16
17

## Contents

18
- `setup_benchmarking_resources.sh` — Sets up benchmarking and profiling resources in your existing Dynamo namespace
19
- `manifests/`
20
  - `pvc.yaml` — PVC `dynamo-pvc`
21
  - `pvc-access-pod.yaml` — short‑lived pod for copying profiler results from the PVC
22
- `kubernetes.py` — helper used by tooling to apply/read resources (e.g., access pod for PVC access)
23
24
- `dynamo_deployment.py` — utilities for working with DynamoGraphDeployment resources
- `requirements.txt` — Python dependencies for benchmarking utilities
25
26
27

## Quick start

28
### Benchmarking Resource Setup
29

30
After setting up Dynamo Cloud, use this script to prepare your namespace with the additional resources needed for benchmarking and profiling workflows:
31

32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
The setup script creates a `dynamo-pvc` with `ReadWriteMany` (RWX). If your cluster's default `storageClassName` does not support RWX, set `storageClassName` in `deploy/utils/manifests/pvc.yaml` to an RWX-capable class before running the script.

Example (add under `spec` in `deploy/utils/manifests/pvc.yaml`):
```yaml
...
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: <your-rwx-storageclass>
...
```

> [!TIP]
> **Check your clusters storage classes**
>
> - List storage classes and provisioners:
> ```bash
> kubectl get sc -o wide
> ```

52
```bash
53
54
export NAMESPACE=your-dynamo-namespace
export HF_TOKEN=<HF_TOKEN>  # Optional: for HuggingFace model access
55

56
deploy/utils/setup_benchmarking_resources.sh
57
58
```

59
This script applies the following manifests to your existing Dynamo namespace:
60
61
62

- `deploy/utils/manifests/pvc.yaml` - PVC `dynamo-pvc`

63
If `HF_TOKEN` is provided, it also creates a secret for HuggingFace model access.
64

65
After running the setup script, verify the resources by checking:
66
67

```bash
68
kubectl get pvc dynamo-pvc -n $NAMESPACE
69
70
```

71
### Working with the PVC
72

73
The Persistent Volume Claim (PVC) stores configuration files and benchmark/profiling results. Use `kubectl cp` to copy files to and from the PVC.
74

75
#### Setting Up PVC Access
76

77
First, create a temporary access pod to interact with the PVC:
78

79
80
81
82
83
84
85
```bash
# Create access pod
kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE

# Wait for pod to be ready
kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
```
86

87
88
89
#### Copying Files to the PVC

**Copy deployment configurations for profiling:**
90
91

```bash
92
93
94
95
96
# Copy a single file
kubectl cp ./my-disagg.yaml $NAMESPACE/pvc-access-pod:/data/configs/disagg.yaml

# Copy an entire directory
kubectl cp ./configs/ $NAMESPACE/pvc-access-pod:/data/configs/
97
98
```

99
100
#### Downloading Files from the PVC

101
**Download benchmark results:**
102
103

```bash
104
105
106
107
108
# Download entire results directory
kubectl cp $NAMESPACE/pvc-access-pod:/data/results ./benchmarks/results

# Download a specific subdirectory
kubectl cp $NAMESPACE/pvc-access-pod:/data/results/benchmark-name ./benchmarks/results/benchmark-name
109
110
```

111
**Inspect profiling results (optional, for local inspection):**
112
113

```bash
114
115
116
117
118
# View the generated DGD configuration from profiling
kubectl get configmap dgdr-output-<dgdr-name> -n $NAMESPACE -o yaml

# View the planner profiling data (JSON format)
kubectl get configmap planner-profile-data -n $NAMESPACE -o yaml
119
120
```

121
122
123
124
125
> **Note on Profiling Results**: When using DGDR (DynamoGraphDeploymentRequest) for SLA-driven profiling, profiling data is automatically stored in ConfigMaps:
> - `dgdr-output-<dgdr-name>`: Contains the generated DynamoGraphDeployment YAML
> - `planner-profile-data`: Contains profiling performance data in JSON format for the planner
>
> The planner component reads this data directly from the mounted ConfigMap, so no PVC is needed.
126

127
128
129
130
131
132
133
#### Cleanup Access Pod

When finished, delete the access pod:

```bash
kubectl delete pod pvc-access-pod -n $NAMESPACE
```
134

135
#### Path Structure
136

137
**Common path patterns in the PVC:**
138
- `/data/configs/` - Configuration files (DGD manifests)
139
- `/data/results/` - Benchmark results (for download after benchmarking jobs)
140
141
- `/data/benchmarking/` - Benchmarking artifacts

142
143
#### Next Steps

144
For complete benchmarking and profiling workflows:
145
- **Benchmarking Guide**: See [docs/benchmarks/benchmarking.md](../../docs/benchmarks/benchmarking.md) for comparing DynamoGraphDeployments and external endpoints
146
- **Pre-Deployment Profiling**: See [docs/benchmarks/sla_driven_profiling.md](../../docs/benchmarks/sla_driven_profiling.md) for optimizing configurations before deployment
147
148
149

## Notes

150
- This setup is focused on benchmarking and profiling resources only - the main Dynamo platform must be installed separately.