Unverified Commit 6e56bad6 authored by Jonathan Tong's avatar Jonathan Tong Committed by GitHub
Browse files

feat: Add Tiltfile for operator local development with live-reload (#6971)


Signed-off-by: default avatarJont828 <jt572@cornell.edu>
parent 7fb597a2
...@@ -35,5 +35,7 @@ tilt-settings.local.yaml ...@@ -35,5 +35,7 @@ tilt-settings.local.yaml
*.swo *.swo
*~ *~
# Tilt per-developer settings (defaults are defined in the Tiltfile)
tilt-settings.yaml
!env* !env*
\ No newline at end of file
...@@ -31,6 +31,76 @@ Built with [Kubebuilder](https://book.kubebuilder.io/), it follows Kubernetes be ...@@ -31,6 +31,76 @@ Built with [Kubebuilder](https://book.kubebuilder.io/), it follows Kubernetes be
make make
``` ```
### Local development with Tilt
[Tilt](https://docs.tilt.dev/install.html) provides a live-reload development loop for the operator. It compiles the Go binary locally, builds a minimal Docker image, renders the production Helm chart, and deploys everything to your cluster. On code changes, Tilt recompiles and live-updates the binary without a full image rebuild — giving fast iteration on controller logic against a real cluster.
#### Prerequisites
The following tools must be installed and available in your `PATH` before running `tilt up`:
| Tool | Version | Purpose | Install |
|------|---------|---------|---------|
| [Go](https://go.dev/doc/install) | ≥ 1.25 | Compiles the manager binary locally | [go.dev/doc/install](https://go.dev/doc/install) |
| [Tilt](https://docs.tilt.dev/install.html) | latest | Live-reload dev loop orchestrator | [docs.tilt.dev/install](https://docs.tilt.dev/install.html) |
| [Helm](https://helm.sh/docs/intro/install/) | v3 | Renders the platform Helm chart | [helm.sh/docs/intro/install](https://helm.sh/docs/intro/install/) |
| [kubectl](https://kubernetes.io/docs/tasks/tools/) | ≥ 1.29 | Applies CRDs and creates the namespace | [kubernetes.io/docs/tasks/tools](https://kubernetes.io/docs/tasks/tools/) |
| [Docker](https://docs.docker.com/get-docker/) | latest | Builds the live-update container image | [docs.docker.com/get-docker](https://docs.docker.com/get-docker/) |
**Conditional prerequisites** (only needed when `skip_codegen: false`, the default):
| Tool | Version | Purpose | Install |
|------|---------|---------|---------|
| [yq](https://github.com/mikefarah/yq) | v4+ | Post-processes generated CRD YAML | `make ensure-yq` or [github.com/mikefarah/yq](https://github.com/mikefarah/yq) |
| [Python 3](https://www.python.org/) + [pydantic](https://docs.pydantic.dev/) | 3.x | Generates Pydantic models from Go types (`make generate`) | `pip install pydantic` |
> **Tip:** Set `skip_codegen: true` in `tilt-settings.yaml` to skip CRD/code generation on every reload. This removes the yq/Python requirement and speeds up iteration when you haven't changed API types.
**Cluster:** You need a Kubernetes cluster (kind, minikube, GKE, EKS, bare-metal, etc.) with a kubeconfig context that Tilt can reach. If your cluster has GPUs and you want to test DGD/DGDR workloads end-to-end, the [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html) should be installed on the cluster.
#### Setup
1. **Create `tilt-settings.yaml`** in `deploy/operator/` with this minimal config:
```yaml
allowed_contexts:
- h100 # Change to your Kubernetes context
registry: docker.io/myuser # Change to your Docker registry
```
2. **Run Tilt**:
```bash
cd deploy/operator
tilt up
```
The Tilt UI will open at http://localhost:10350 showing resource status and logs.
#### Features
- **Fast iteration**: On code changes, Tilt recompiles the manager binary and live-updates it into the running container — no full image rebuild needed
- **Real cluster testing**: Reconciles against your actual Kubernetes cluster (kind, minikube, GKE, AKS, etc.)
- **CRD + Helm rendering**: Automatically applies CRDs and renders the platform Helm chart with your configuration
- **Infrastructure toggles**: Control NATS, etcd, KAI scheduler, and Grove via `tilt-settings.yaml`
#### Optional configuration
Additional settings available in `tilt-settings.yaml`:
```yaml
# Infrastructure toggles (control which components are deployed)
enable_nats: true # Enable NATS messaging (default: true, required for DGD/DGDR)
enable_etcd: false # Enable etcd service discovery (default: false)
enable_kai_scheduler: false # Enable KAI GPU-aware scheduler (default: false)
enable_grove: false # Enable Grove orchestrator (default: false)
# Other settings
namespace: dynamo-system # Kubernetes namespace for operator deployment
skip_codegen: false # Skip code generation for faster reloads if API unchanged
image_pull_secret: "" # Name of Secret for private Docker registries
helm_values: {} # Extra Helm value overrides for platform chart
operator_version: "0.0.0-dev" # Override operator version (default: from Chart.yaml)
```
### Install ### Install
See [Dynamo Kubernetes Platform Installation Guide](/docs/kubernetes/installation-guide.md) for installation instructions. See [Dynamo Kubernetes Platform Installation Guide](/docs/kubernetes/installation-guide.md) for installation instructions.
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Tiltfile for developing the Dynamo Kubernetes Operator.
#
# Usage:
# cd deploy/operator
# # edit tilt-settings.yaml as needed
# tilt up
#
# What it does:
# 1. Compiles the Go manager binary locally (fast, native).
# 2. Builds a minimal Docker image containing only the binary.
# 3. Renders the production Helm chart (deploy/helm/charts/platform) with
# `helm template`, applies CRDs via kubectl, and deploys all rendered
# resources via k8s_yaml().
# 4. On code change Tilt recompiles the binary and live-updates it into the
# running container — no full image rebuild needed.
#
# Prerequisites (must be in PATH):
# - Go >= 1.25 — compiles the manager binary locally
# - tilt — live-reload orchestrator (https://docs.tilt.dev/install.html)
# - helm v3 — renders the platform Helm chart
# - kubectl >= 1.29 — applies CRDs and creates the namespace
# - docker — builds the live-update container image
# - A Kubernetes cluster reachable via your current kubeconfig context
#
# Conditional (only when skip_codegen is false, the default):
# - yq v4+ — post-processes generated CRD YAML (run `make ensure-yq`)
# - python3 + pydantic — generates Pydantic models from Go types
#
# The tilt restart_process extension is auto-fetched on first `tilt up`.
load('ext://restart_process', 'docker_build_with_restart')
# ---------------------------------------------------------------------------
# Settings — defaults are defined here; tilt-settings.yaml overrides them.
# ---------------------------------------------------------------------------
settings = {
'namespace': 'dynamo-system',
'enable_nats': True, # required for DGD/DGDR workloads
'enable_etcd': False, # only if discoveryBackend is "etcd"
'enable_kai_scheduler': False, # GPU-aware scheduling for multi-node
'enable_grove': False, # PodClique-based multi-node orchestration
'skip_codegen': False, # skip make generate/manifests for faster iteration
'image_pull_secret': '', # name of docker-registry Secret for private registries
'helm_values': {}, # extra --set overrides passed to helm template
}
if os.path.exists('tilt-settings.yaml'):
data = read_yaml('tilt-settings.yaml', default={})
if data:
settings.update(data)
if 'allowed_contexts' in settings:
allow_k8s_contexts(settings['allowed_contexts'])
# Registry — resolved from (highest priority wins):
# 1. REGISTRY env var (e.g. REGISTRY=docker.io/myuser tilt up)
# 2. "registry" in tilt-settings.yaml
# When set the operator image is pushed as <registry>/controller:tilt-dev.
REGISTRY = os.getenv('REGISTRY', settings.get('registry', ''))
if REGISTRY:
REGISTRY = REGISTRY.rstrip('/')
NAMESPACE = settings['namespace']
HELM_VALUES = settings['helm_values']
ENABLE_NATS = settings['enable_nats']
ENABLE_ETCD = settings['enable_etcd']
ENABLE_KAI_SCHEDULER = settings['enable_kai_scheduler']
ENABLE_GROVE = settings['enable_grove']
IMAGE_PULL_SECRET = settings['image_pull_secret']
# ---------------------------------------------------------------------------
# Operator version — passed as --operator-version to the manager binary.
# The Helm chart uses .Chart.AppVersion; for Tilt dev we read it from the
# operator subchart's Chart.yaml so it stays in sync automatically.
# Override via tilt-settings.yaml if needed:
#
# tilt-settings.yaml:
# operator_version: "1.2.3"
# ---------------------------------------------------------------------------
def _read_chart_app_version():
"""Read appVersion from the operator subchart's Chart.yaml."""
chart_path = os.path.join(
os.getcwd(), '..', 'helm', 'charts', 'platform',
'components', 'operator', 'Chart.yaml')
if os.path.exists(chart_path):
chart = read_yaml(chart_path, default={})
if chart and 'appVersion' in chart:
return str(chart['appVersion'])
return '0.0.0-dev'
OPERATOR_VERSION = settings.get('operator_version', _read_chart_app_version())
# ---------------------------------------------------------------------------
# Paths (relative to this Tiltfile, i.e. deploy/operator/)
# ---------------------------------------------------------------------------
OPERATOR_DIR = os.getcwd() # deploy/operator
HELM_CHART = os.path.join(OPERATOR_DIR, '..', 'helm', 'charts', 'platform') # deploy/helm/charts/platform
CRD_DIR = os.path.join(HELM_CHART, 'components', 'operator', 'crds')
IMG_NAME = 'controller'
IMG_TAG = 'tilt-dev'
IMG = (REGISTRY + '/' + IMG_NAME) if REGISTRY else IMG_NAME
IMG_REF = IMG + ':' + IMG_TAG
# ---------------------------------------------------------------------------
# Compile the manager binary locally (much faster than building in Docker)
# ---------------------------------------------------------------------------
def compile_manager():
return 'CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o tilt_bin/manager ./cmd/main.go'
local_resource(
'manager-build',
compile_manager(),
deps=[
'api/',
'cmd/',
'internal/',
'go.mod',
'go.sum',
],
ignore=['**/zz_generated.deepcopy.go'],
labels=['operator'],
)
# ---------------------------------------------------------------------------
# CRDs — regenerate & apply via server-side apply on change
# ---------------------------------------------------------------------------
SKIP_CODEGEN = settings['skip_codegen']
_crd_cmd = 'kubectl apply --server-side --force-conflicts -f ' + CRD_DIR
if not SKIP_CODEGEN:
_crd_cmd = 'make generate && make manifests && ' + _crd_cmd
local_resource(
'crds',
_crd_cmd,
deps=['api/'],
ignore=['**/zz_generated.deepcopy.go'],
labels=['operator'],
)
# ---------------------------------------------------------------------------
# Helm template → k8s_yaml
#
# Renders the production Helm chart (deploy/helm/charts/platform) with the
# operator and required infrastructure (NATS by default). This gives you a
# fully working cluster where you can apply DGDR/DGD resources and have them
# reconcile into real workloads on your GPU nodes — while live-reloading the
# controller binary on every code change.
#
# The chart has no Helm hooks — webhook certificates, CA bundle injection,
# and MPI SSH key generation are all handled by the operator binary at
# runtime (auto mode).
# ---------------------------------------------------------------------------
def render_helm():
"""Render the platform Helm chart with only the operator subchart enabled."""
helm_cmd = [
'helm', 'template', 'dynamo', HELM_CHART,
'--namespace', NAMESPACE,
'--set', 'dynamo-operator.enabled=true',
# Subcharts — NATS is on by default (workers need it)
'--set', 'nats.enabled=%s' % str(ENABLE_NATS).lower(),
'--set', 'dynamo-operator.nats.enabled=%s' % str(ENABLE_NATS).lower(),
'--set', 'global.etcd.install=%s' % str(ENABLE_ETCD).lower(),
'--set', 'global.kai-scheduler.install=%s' % str(ENABLE_KAI_SCHEDULER).lower(),
'--set', 'global.grove.install=%s' % str(ENABLE_GROVE).lower(),
# Point image at our Tilt-managed image
'--set', 'dynamo-operator.controllerManager.manager.image.repository=' + IMG,
'--set', 'dynamo-operator.controllerManager.manager.image.tag=' + IMG_TAG,
'--set', 'dynamo-operator.controllerManager.manager.image.pullPolicy=IfNotPresent',
# We apply CRDs ourselves in the local_resource above
'--set', 'dynamo-operator.upgradeCRD=false',
'--skip-crds',
]
# Wire in imagePullSecrets when a pull secret is configured
if IMAGE_PULL_SECRET:
helm_cmd += ['--set', 'dynamo-operator.imagePullSecrets[0].name=' + IMAGE_PULL_SECRET]
# Append user-provided Helm overrides from tilt-settings
for k, v in HELM_VALUES.items():
helm_cmd += ['--set', '%s=%s' % (k, v)]
data = local(helm_cmd, quiet=True)
# Decode the YAML stream so we can patch individual documents
decoded = decode_yaml_stream(data)
patched = []
for doc in decoded:
if doc == None:
continue
# Ensure namespaced resources land in the target namespace.
# Cluster-scoped kinds must not have a namespace set.
_cluster_scoped_kinds = [
'ClusterRole', 'ClusterRoleBinding',
'ValidatingWebhookConfiguration', 'MutatingWebhookConfiguration',
'CustomResourceDefinition', 'Namespace',
'PriorityClass', 'StorageClass', 'IngressClass',
]
kind = doc.get('kind', '')
if 'metadata' in doc and 'namespace' not in doc['metadata'] and kind not in _cluster_scoped_kinds:
doc['metadata']['namespace'] = NAMESPACE
# Strip securityContext so Tilt's live_update (writing into the
# container as root) doesn't get blocked by non-root restrictions.
if doc.get('kind') == 'Deployment':
spec = doc.get('spec', {}).get('template', {}).get('spec', {})
spec.pop('securityContext', None)
for c in spec.get('containers', []):
c.pop('securityContext', None)
patched.append(doc)
return encode_yaml_stream(patched)
# Create the namespace before applying anything else
local('kubectl create namespace %s || true' % NAMESPACE, quiet=True)
k8s_yaml(render_helm())
# ---------------------------------------------------------------------------
# Docker image — minimal container with just the compiled binary
# ---------------------------------------------------------------------------
DOCKERFILE = '''
FROM alpine:3.20 AS base
RUN apk add --no-cache ca-certificates
FROM base
WORKDIR /
COPY ./tilt_bin/manager /manager
COPY ./tilt_bin/manager /workspace/manager
ENTRYPOINT ["/manager"]
'''
docker_build_with_restart(
IMG_REF,
context='.',
dockerfile_contents=DOCKERFILE,
entrypoint=['/manager', '--config=/etc/dynamo-operator/config.yaml',
'--operator-version=' + OPERATOR_VERSION],
only=['./tilt_bin/manager'],
live_update=[
sync('./tilt_bin/manager', '/manager'),
],
)
if not REGISTRY:
print('WARNING: no registry configured — image will only be available locally.')
print(' Set "registry" in tilt-settings.yaml or pass REGISTRY env var.')
# ---------------------------------------------------------------------------
# Resource grouping — keep the Tilt UI tidy
# ---------------------------------------------------------------------------
k8s_resource(
workload='dynamo-dynamo-operator-controller-manager',
new_name='operator',
labels=['operator'],
port_forwards=['8081:8081'], # health endpoint
resource_deps=['crds', 'manager-build'],
)
# Group subchart workloads in the Tilt UI
if ENABLE_NATS:
k8s_resource(
workload='dynamo-nats',
labels=['infrastructure'],
)
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
...@@ -69,6 +69,8 @@ navigation: ...@@ -69,6 +69,8 @@ navigation:
path: kubernetes/autoscaling.md path: kubernetes/autoscaling.md
- page: Rolling Update - page: Rolling Update
path: kubernetes/rolling-update.md path: kubernetes/rolling-update.md
- page: Developing with Tilt
path: kubernetes/tilt-dev-setup.md
- page: Inference Gateway (GAIE) - page: Inference Gateway (GAIE)
path: kubernetes/inference-gateway.md path: kubernetes/inference-gateway.md
- page: Snapshot - page: Snapshot
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Developing the Operator with Tilt
subtitle: Fast, live-reload development loop for the Dynamo Kubernetes operator
---
## Overview
[Tilt](https://tilt.dev) provides a live-reload development environment for the
Dynamo Kubernetes operator. Instead of manually building images, pushing to a
registry, and redeploying on every change, Tilt watches your source files and
automatically recompiles the Go binary, syncs it into the running container, and
restarts the process — all in seconds.
Under the hood, the Tiltfile:
1. **Compiles** the Go manager binary locally (`CGO_ENABLED=0`).
2. **Builds** a minimal Docker image containing only the binary.
3. **Renders** the production Helm chart (`deploy/helm/charts/platform`) with
`helm template`, applies CRDs via `kubectl`, and deploys all rendered
resources.
4. **Live-updates** the binary inside the running container on every code
change — no full image rebuild required.
This gives you a fully working cluster where you can apply `DynamoGraphDeployment`
and `DynamoGraphDeploymentRequest` resources and have them reconcile into real
workloads — while iterating on controller logic with sub-second feedback.
## Prerequisites
| Tool | Version | Purpose |
|------|---------|---------|
| [Tilt](https://docs.tilt.dev/install.html) | v0.33+ | Development orchestration |
| [Helm](https://helm.sh/docs/intro/install/) | v3 | Chart rendering |
| [Go](https://go.dev/dl/) | 1.25+ | Compiling the operator |
| [kubectl](https://kubernetes.io/docs/tasks/tools/) | — | Cluster access |
| A Kubernetes cluster | — | kind, minikube, or remote cluster |
You also need a **container registry** that is accessible to your cluster's
nodes, so they can pull the operator image Tilt builds. If you use a local
cluster like kind with a local registry, Tilt can push there directly.
## Quick Start
```bash
cd deploy/operator
# Create your personal settings file (gitignored)
cat > tilt-settings.yaml <<EOF
allowed_contexts:
- my-cluster-context
registry: docker.io/myuser
skip_codegen: true
EOF
# Launch
tilt up
```
Tilt opens a terminal UI and a web dashboard at <http://localhost:10350>.
The dashboard shows resource status, build logs, and port-forwards.
Press **Space** in the terminal to open the web UI. Press **Ctrl-C** to
shut everything down (resources remain deployed; run `tilt down` to tear
them down).
![Tilt web UI showing the operator, CRDs, and infrastructure resources](../assets/img/tilt-ui.png)
## Configuration
All configuration is optional. The Tiltfile defines sensible defaults for every
setting, and `tilt-settings.yaml` is gitignored so your personal values
(cluster context, registry, etc.) never leak into the repo.
Create `deploy/operator/tilt-settings.yaml` with any of the settings below:
```yaml
# Kubernetes contexts Tilt is allowed to connect to.
# Safety guard: prevents accidental deployments to production clusters.
allowed_contexts:
- my-cluster-context
# Container registry for the operator image.
# Can also be set via the REGISTRY env var (env var takes precedence).
registry: docker.io/myuser
# Skip running `make generate && make manifests` before applying CRDs.
# Set to true when you haven't changed API types (faster iteration).
skip_codegen: true
# Target namespace for the operator and related resources.
# namespace: dynamo-system
# Subchart toggles
# enable_nats: true # Required for DGD/DGDR workloads (default: true)
# enable_etcd: false # Only if discoveryBackend is "etcd"
# enable_kai_scheduler: false # GPU-aware scheduling for multi-node
# enable_grove: false # PodClique-based multi-node orchestration
# Extra Helm value overrides (applied on top of subchart toggles)
# helm_values:
# dynamo-operator.discoveryBackend: kubernetes
# dynamo-operator.natsAddr: "nats://external-nats:4222"
```
### Settings Reference
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `allowed_contexts` | list | *(none)* | Kubernetes contexts Tilt may connect to. Prevents accidental production deploys. |
| `registry` | string | `""` | Container registry prefix (e.g. `docker.io/myuser`). Also settable via `REGISTRY` env var, which takes precedence. |
| `namespace` | string | `dynamo-system` | Namespace for the operator Deployment and related resources. |
| `skip_codegen` | bool | `false` | Skip `make generate && make manifests` before applying CRDs. Set to `true` when you haven't changed API types. |
| `enable_nats` | bool | `true` | Deploy NATS subchart. Required for DGD/DGDR workloads (workers use it for communication). |
| `enable_etcd` | bool | `false` | Deploy etcd subchart. Only needed when `discoveryBackend` is `etcd`. |
| `enable_kai_scheduler` | bool | `false` | Deploy kai-scheduler for GPU-aware scheduling in multi-node setups. |
| `enable_grove` | bool | `false` | Deploy Grove for PodClique-based multi-node orchestration. |
| `image_pull_secret` | string | `""` | Name of a `docker-registry` Secret for pulling images from private registries. |
| `helm_values` | map | `{}` | Arbitrary `--set` overrides passed to `helm template`. |
| `operator_version` | string | *(from Chart.yaml)* | Operator `--operator-version` flag. Defaults to `appVersion` from the operator subchart. |
### Registry Configuration
The operator image needs to be pullable by your cluster's nodes. The registry is resolved in priority order:
1. **`REGISTRY` env var**`REGISTRY=docker.io/myuser tilt up`
2. **`registry` in `tilt-settings.yaml`**
The image is pushed as `<registry>/controller:tilt-dev`.
<Warning>
If no registry is configured, the image is only available locally. This works
with kind using a local registry but will fail on remote clusters.
</Warning>
## How It Works
When you run `tilt up`, the following resources are created in order:
```
manager-build Compile Go binary locally
├───── crds Apply CRDs via server-side apply
operator Deploy operator pod (live-updated)
```
The operator handles webhook certificate generation, CA bundle injection, and
MPI SSH key provisioning at runtime — no external setup needed.
### What Each Resource Does
**manager-build** — Runs `CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build` to
compile the operator binary. Re-runs on changes to `api/`, `cmd/`, `internal/`,
`go.mod`, or `go.sum`.
**crds** — Applies CRDs from the Helm chart via `kubectl apply --server-side`.
When `skip_codegen` is `false`, runs `make generate && make manifests` first.
**operator** — The operator Deployment itself. Tilt watches the compiled binary
and uses `live_update` to sync it into the running container and restart the
process — no image rebuild needed. On startup, the operator's built-in cert
controller generates a self-signed TLS certificate, injects the CA bundle into
webhook configurations, and creates the MPI SSH secret — matching production
behavior exactly.
### Live Update Cycle
The inner development loop looks like this:
1. You edit Go source files under `deploy/operator/`.
2. Tilt detects the change and recompiles the binary (~2-5 seconds).
3. The new binary is synced into the running container via `live_update`.
4. The process restarts automatically.
5. Your controller changes are live — test by applying a DGD/DGDR.
No `docker build`, no `docker push`, no `kubectl rollout restart`.
## Webhook Certificates
The operator handles webhook TLS certificates automatically at runtime using a
built-in cert controller (based on OPA cert-controller). On startup it:
1. Creates a self-signed CA and webhook serving certificate.
2. Stores them in the `webhook-server-cert` Secret.
3. Injects the CA bundle into `ValidatingWebhookConfiguration` and
`MutatingWebhookConfiguration` resources.
This matches production behavior and requires no external tooling. For
alternative certificate management (cert-manager or external certs), see the
[webhook documentation](../kubernetes/webhooks.md) and configure via
`helm_values` in `tilt-settings.yaml`.
## Typical Workflows
### Iterating on Controller Logic
The most common workflow — you're modifying reconciliation logic and want fast
feedback:
```yaml
# tilt-settings.yaml
allowed_contexts: [my-cluster]
registry: docker.io/myuser
skip_codegen: true
```
```bash
tilt up
# Edit files under internal/controller/
# Tilt auto-recompiles and live-updates
# Apply test resources:
kubectl apply -f examples/backends/vllm/deploy/agg.yaml
```
### Changing API Types (CRDs)
When you modify files under `api/`, you need codegen to run:
```yaml
# tilt-settings.yaml
skip_codegen: false # or omit — false is the default
```
Tilt will run `make generate && make manifests` and re-apply CRDs whenever
`api/` files change.
### Testing Multi-Node Features
Enable the necessary subcharts:
```yaml
# tilt-settings.yaml
enable_grove: true
enable_kai_scheduler: true
```
### Using Environment Variables
You can override the registry without editing the settings file:
```bash
REGISTRY=ghcr.io/myorg tilt up
```
## Tilt UI
The web UI at <http://localhost:10350> shows:
- **Resource status** — green/red/pending for each resource
- **Build logs** — compilation output and errors
- **Runtime logs** — operator logs streamed in real time
- **Port forwards** — the health endpoint is forwarded to `localhost:8081`
Resources are grouped by label (`operator` and `infrastructure`) to keep the
UI organized.
## Cleanup
```bash
# Stop Tilt and leave resources deployed
# (Ctrl-C in the terminal)
# Stop Tilt and tear down all resources
tilt down
```
## Troubleshooting
### Image Pull Errors
If pods show `ImagePullBackOff`:
- Verify `registry` is set in `tilt-settings.yaml` or via `REGISTRY` env var.
- Ensure your cluster nodes can pull from that registry.
- For kind with a local registry, follow the
[kind local registry guide](https://kind.sigs.k8s.io/docs/user/local-registry/).
### Webhook TLS Errors
If applying a DGD/DGDR fails with `x509: certificate signed by unknown authority`:
- Check the operator logs in the Tilt UI — the cert controller logs its
progress on startup.
- Verify the `webhook-server-cert` Secret exists and has been populated:
```bash
kubectl -n dynamo-system get secret webhook-server-cert
```
- The operator may need a few seconds after startup to generate certs and
inject the CA bundle. Wait for the `cert-controller` log messages before
applying resources.
### CRD Codegen Failures
If `crds` fails with codegen errors:
- Ensure `controller-gen` is installed: `make controller-gen`
- Try running codegen manually: `make generate && make manifests`
- Set `skip_codegen: true` temporarily to bypass if you haven't changed API types.
### Context Safety Guard
If Tilt refuses to start with a context error, add your cluster context to
`allowed_contexts` in `tilt-settings.yaml`:
```yaml
allowed_contexts:
- my-cluster-context
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment