Unverified Commit 2b36b175 authored by alexanderbilk's avatar alexanderbilk Committed by GitHub
Browse files

feat: remote_write to external DB for agent_* metrics (#7198)


Signed-off-by: default avatarAleksandr Bilkovskii <alexanderb@nvidia.com>
parent 115512ef
......@@ -36,6 +36,31 @@ This script will:
6. Deploy Grafana disaggregated dashboard ConfigMap (auto-imported by Grafana sidecar)
7. Provide Grafana credentials
### Remote write (optional)
To push scraped metrics to an external Prometheus (or any remote-write–compatible endpoint), set these environment variables before running the script:
| Variable | Description | Default |
|-------------------------------|----------------------------------------------|-------------|
| `REMOTE_WRITE_HOST` | Host of the remote write endpoint | — (disabled if unset) |
| `REMOTE_WRITE_PORT` | Port of the remote write endpoint | `9091` |
| `REMOTE_WRITE_METRIC_REGEX` | Prometheus regex for metric names to send | `^agent_.*` |
Metrics are sent only when `REMOTE_WRITE_HOST` is set. The script configures Prometheus with a single `remote_write` target: `http://<host>:<port>/api/v1/write`. Only metrics whose name matches `REMOTE_WRITE_METRIC_REGEX` are sent; all others are filtered out. You can use any valid Prometheus regex (e.g. `^agent_.*`, `^nixl_.*`, `^(agent_|nixl_).*`).
Examples:
```bash
# Default filter: agent_.*
REMOTE_WRITE_HOST=prom.example.com REMOTE_WRITE_PORT=9091 ./setup-monitoring.sh
# Custom filter: only nixl_ metrics
REMOTE_WRITE_HOST=prom.example.com REMOTE_WRITE_METRIC_REGEX='^nixl_.*' ./setup-monitoring.sh
# Multiple patterns: agent_ or nixl_
REMOTE_WRITE_HOST=prom.example.com REMOTE_WRITE_METRIC_REGEX='^(agent_|nixl_).*' ./setup-monitoring.sh
```
## Verification
Check that GPU metrics are flowing:
......
......@@ -28,10 +28,36 @@ helm repo update
echo ""
echo "Step 3: Installing kube-prometheus-stack..."
echo "This includes: Prometheus Operator, Prometheus, Grafana, Alertmanager"
PROMETHEUS_REMOTE_WRITE_ARGS=()
if [ -n "${REMOTE_WRITE_HOST:-}" ]; then
REMOTE_WRITE_PORT="${REMOTE_WRITE_PORT:-9091}"
REMOTE_WRITE_METRIC_REGEX="${REMOTE_WRITE_METRIC_REGEX:-^agent_.*}"
echo "Remote write enabled: pushing metrics to ${REMOTE_WRITE_HOST}:${REMOTE_WRITE_PORT} (filter: ${REMOTE_WRITE_METRIC_REGEX})"
REMOTE_WRITE_URL="http://${REMOTE_WRITE_HOST}:${REMOTE_WRITE_PORT}/api/v1/write"
REMOTE_WRITE_JSON="$(
jq -cn \
--arg url "$REMOTE_WRITE_URL" \
--arg regex "$REMOTE_WRITE_METRIC_REGEX" \
'[
{
url: $url,
writeRelabelConfigs: [
{
sourceLabels: ["__name__"],
regex: $regex,
action: "keep"
}
]
}
]'
)"
PROMETHEUS_REMOTE_WRITE_ARGS=(--set-json "prometheus.prometheusSpec.remoteWrite=${REMOTE_WRITE_JSON}")
fi
helm upgrade --install prometheus -n monitoring --create-namespace prometheus-community/kube-prometheus-stack \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
--set-json 'prometheus.prometheusSpec.podMonitorNamespaceSelector={}' \
--set-json 'prometheus.prometheusSpec.probeNamespaceSelector={}'
--set-json 'prometheus.prometheusSpec.probeNamespaceSelector={}' \
"${PROMETHEUS_REMOTE_WRITE_ARGS[@]}"
# Step 4: Wait for pods to be ready
echo ""
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment