Unverified Commit ebd23361 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

feat: add a new composite SW/HW grafana (DYN-678) (#1788)


Co-authored-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent 0584b081
# Metrics # Metrics
The `metrics` component is a utility that can collect, aggregate, and publish The `metrics` component is a utility that can collect, aggregate, and publish
metrics from a Dynamo deployment for use in other applications or visualization metrics from a Dynamo deployment. After collecting and aggregating metrics from
tools like Prometheus and Grafana. workers, it exposes them via an HTTP `/metrics` endpoint in Prometheus format
that other applications or visualization tools like Prometheus server and Grafana can
pull from.
**Note**: This is a demo implementation. The metrics component is currently under active development and this documentation will change as the implementation evolves.
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "nv_llm" (e.g., the HTTP `/metrics` endpoint will serve metrics with "nv_llm" prefixes)
- This demo will only work when using examples/llm/configs/agg.yml-- other configurations will not work
<div align="center"> <div align="center">
<img src="images/dynamo_metrics_grafana.png" alt="Dynamo Metrics Dashboard"/> <img src="images/dynamo_metrics_grafana.png" alt="Dynamo Metrics Dashboard"/>
...@@ -22,16 +28,16 @@ For example: ...@@ -22,16 +28,16 @@ For example:
```bash ```bash
# Default namespace is "dynamo", but can be configured with --namespace # Default namespace is "dynamo", but can be configured with --namespace
# For more detailed output, try setting the env var: DYN_LOG=debug # For more detailed output, try setting the env var: DYN_LOG=debug
metrics --component my_component --endpoint my_endpoint metrics --component MyComponent --endpoint my_endpoint
# 2025-03-17T00:07:05.202558Z INFO metrics: Scraping endpoint dynamo/my_component/my_endpoint for stats # 2025-03-17T00:07:05.202558Z INFO metrics: Scraping endpoint dynamo/MyComponent/my_endpoint for stats
# 2025-03-17T00:07:05.202955Z INFO metrics: Prometheus metrics server started at 0.0.0.0:9091/metrics # 2025-03-17T00:07:05.202955Z INFO metrics: Prometheus metrics server started at 0.0.0.0:9091/metrics
# ... # ...
``` ```
With no matching endpoints running to collect stats from, you should see warnings in the logs: With no matching endpoints running to collect stats from, you should see warnings in the logs:
```bash ```bash
2025-03-17T00:07:06.204756Z WARN metrics: No endpoints found matching dynamo/my_component/my_endpoint 2025-03-17T00:07:06.204756Z WARN metrics: No endpoints found matching dynamo/MyComponent/my_endpoint
``` ```
After a worker with a matching endpoint gets started, the endpoint After a worker with a matching endpoint gets started, the endpoint
...@@ -44,22 +50,23 @@ so below are some examples of workers and how they can be monitored. ...@@ -44,22 +50,23 @@ so below are some examples of workers and how they can be monitored.
### Mock Worker ### Mock Worker
For quick testing and debugging, there is a Rust-based To try out how `metrics` works, there is a demo Rust-based
[mock worker](src/bin/mock_worker.rs) that registers a mock [mock worker](src/bin/mock_worker.rs) that provides sample data through two mechanisms:
`StatsHandler` under an endpoint named 1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from `metrics`) with randomly generated `ForwardPassMetrics` data
`dynamo/my_component/my_endpoint` and publishes random data. 2. Publishes mock `KVHitRateEvent` data every second to demonstrate event-based metrics
Step 1: Launch a mock workers via the following command (if already built):
```bash ```bash
# Can run multiple workers in separate shells to see aggregation as well. # or build/run from source: DYN_LOG=DEBUG cargo run --bin mock_worker
# Or to build/run from source: cargo run --bin mock_worker
mock_worker mock_worker
# 2025-03-16T23:49:28.101668Z INFO mock_worker: Starting Mock Worker on Endpoint: dynamo/my_component/my_endpoint # 2025-03-16T23:49:28.101668Z INFO mock_worker: Starting Mock Worker on Endpoint: dynamo/MyComponent/my_endpoint
``` ```
To monitor the metrics of these mock workers, run: Step 2: Monitor the metrics of these mock workers, and prepare its own Prometheus endpoint at
port 9091 (a default, when --port is not specified) on /metrics:
```bash ```bash
metrics --component my_component --endpoint my_endpoint metrics --component MyComponent --endpoint my_endpoint
``` ```
### Real Worker ### Real Worker
...@@ -69,13 +76,14 @@ see the examples in [examples/llm](../../examples/llm). ...@@ -69,13 +76,14 @@ see the examples in [examples/llm](../../examples/llm).
For example, for a VLLM + KV Routing based deployment that For example, for a VLLM + KV Routing based deployment that
exposes statistics on an endpoint labeled exposes statistics on an endpoint labeled
`dynamo/VllmWorker/load_metrics`: `dynamo/VllmWorker/load_metrics` (note: this does NOT currently work
with any other example such as examples/vllm_v0, vllm_v1, ...):
```bash ```bash
cd deploy/examples/llm cd deploy/examples/llm
dynamo serve <vllm kv routing example args> dynamo serve graphs.agg:Frontend -f configs/agg.yaml
``` ```
To monitor the metrics of these VllmWorkers, run: Then, to monitor the metrics of these VllmWorkers, run:
```bash ```bash
metrics --component VllmWorker --endpoint load_metrics metrics --component VllmWorker --endpoint load_metrics
``` ```
...@@ -105,10 +113,10 @@ Prometheus server or curl client can pull from: ...@@ -105,10 +113,10 @@ Prometheus server or curl client can pull from:
```bash ```bash
# Start metrics server on default host (0.0.0.0) and port (9091) # Start metrics server on default host (0.0.0.0) and port (9091)
metrics --component my_component --endpoint my_endpoint metrics --component MyComponent --endpoint my_endpoint
# Or specify a custom port # Or specify a custom port
metrics --component my_component --endpoint my_endpoint --port 9092 metrics --component MyComponent --endpoint my_endpoint --port 9092
``` ```
In pull mode: In pull mode:
...@@ -121,12 +129,12 @@ curl localhost:9091/metrics ...@@ -121,12 +129,12 @@ curl localhost:9091/metrics
# # HELP llm_kv_blocks_active Active KV cache blocks # # HELP llm_kv_blocks_active Active KV cache blocks
# # TYPE llm_kv_blocks_active gauge # # TYPE llm_kv_blocks_active gauge
# llm_kv_blocks_active{component="my_component",endpoint="my_endpoint",worker_id="7587884888253033398"} 40 # llm_kv_blocks_active{component="MyComponent",endpoint="my_endpoint",worker_id="7587884888253033398"} 40
# llm_kv_blocks_active{component="my_component",endpoint="my_endpoint",worker_id="7587884888253033401"} 2 # llm_kv_blocks_active{component="MyComponent",endpoint="my_endpoint",worker_id="7587884888253033401"} 2
# # HELP llm_kv_blocks_total Total KV cache blocks # # HELP llm_kv_blocks_total Total KV cache blocks
# # TYPE llm_kv_blocks_total gauge # # TYPE llm_kv_blocks_total gauge
# llm_kv_blocks_total{component="my_component",endpoint="my_endpoint",worker_id="7587884888253033398"} 100 # llm_kv_blocks_total{component="MyComponent",endpoint="my_endpoint",worker_id="7587884888253033398"} 100
# llm_kv_blocks_total{component="my_component",endpoint="my_endpoint",worker_id="7587884888253033401"} 100 # llm_kv_blocks_total{component="MyComponent",endpoint="my_endpoint",worker_id="7587884888253033401"} 100
``` ```
### Push Mode ### Push Mode
...@@ -145,7 +153,7 @@ Start the metrics component in `--push` mode, specifying the host and port of yo ...@@ -145,7 +153,7 @@ Start the metrics component in `--push` mode, specifying the host and port of yo
```bash ```bash
# Push metrics to a Prometheus PushGateway every --push-interval seconds # Push metrics to a Prometheus PushGateway every --push-interval seconds
metrics \ metrics \
--component my_component \ --component MyComponent \
--endpoint my_endpoint \ --endpoint my_endpoint \
--host 127.0.0.1 \ --host 127.0.0.1 \
--port 9091 \ --port 9091 \
...@@ -173,7 +181,7 @@ For easy iteration while making edits to the metrics component, you can use `car ...@@ -173,7 +181,7 @@ For easy iteration while making edits to the metrics component, you can use `car
to build and run with your local changes: to build and run with your local changes:
```bash ```bash
cargo run --bin metrics -- --component my_component --endpoint my_endpoint cargo run --bin metrics -- --component MyComponent --endpoint my_endpoint
``` ```
...@@ -146,7 +146,7 @@ async fn backend(runtime: DistributedRuntime) -> Result<()> { ...@@ -146,7 +146,7 @@ async fn backend(runtime: DistributedRuntime) -> Result<()> {
let namespace = runtime.namespace("dynamo")?; let namespace = runtime.namespace("dynamo")?;
// we must first create a service, then we can attach one more more endpoints // we must first create a service, then we can attach one more more endpoints
let component = namespace let component = namespace
.component("my_component")? .component("MyComponent")?
.service_builder() .service_builder()
.create() .create()
.await?; .await?;
......
...@@ -100,16 +100,18 @@ Note: You may need to adjust the target based on your host configuration and net ...@@ -100,16 +100,18 @@ Note: You may need to adjust the target based on your host configuration and net
Grafana is pre-configured with: Grafana is pre-configured with:
- Prometheus datasource - Prometheus datasource
- Sample dashboard for visualizing service metrics - Sample dashboard for visualizing service metrics
![grafana image](./grafana1.png) ![grafana image](./grafana-dynamo-composite.png)
## Required Files ## Required Files
The following configuration files should be present in this directory: The following configuration files should be present in this directory:
- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services - [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services
- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration - [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration
- [grafana.json](./grafana.json): Contains Grafana dashboard configuration
- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration - [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration
- [grafana-dashboard-providers.yml](./grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration - [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
- [grafana_dashboards/grafana-dynamo-dashboard.json](./grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics.
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): Contains Grafana dashboard configuration for LLM specific metrics.
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
## Running the example `metrics` component ## Running the example `metrics` component
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# IMPORT NOTE: Make sure this is in sync with lib/runtime/docker-compose.yml
networks: networks:
server: server:
driver: bridge driver: bridge
...@@ -83,6 +84,8 @@ services: ...@@ -83,6 +84,8 @@ services:
networks: networks:
- monitoring - monitoring
# To access Prometheus from another machine, you may need to disable te firewall on your host. On Ubuntu:
# sudo ufw allow 9090/tcp
prometheus: prometheus:
image: prom/prometheus:v3.4.1 image: prom/prometheus:v3.4.1
container_name: prometheus container_name: prometheus
...@@ -98,11 +101,13 @@ services: ...@@ -98,11 +101,13 @@ services:
restart: unless-stopped restart: unless-stopped
# Example to pull from the /query endpoint: # Example to pull from the /query endpoint:
# {__name__=~"DCGM.*", job="dcgm-exporter"} # {__name__=~"DCGM.*", job="dcgm-exporter"}
ports:
- "9090:9090"
networks: networks:
- monitoring - monitoring
ports:
- "9090:9090"
profiles: [metrics] profiles: [metrics]
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on: depends_on:
- dcgm-exporter - dcgm-exporter
- nats-prometheus-exporter - nats-prometheus-exporter
...@@ -110,23 +115,29 @@ services: ...@@ -110,23 +115,29 @@ services:
# grafana connects to prometheus via the /query endpoint. # grafana connects to prometheus via the /query endpoint.
# Default credentials are dynamo/dynamo. # Default credentials are dynamo/dynamo.
# To access Grafana from another machine, you may need to disable te firewall on your host. On Ubuntu:
# sudo ufw allow 3001/tcp
grafana: grafana:
image: grafana/grafana-enterprise:12.0.1 image: grafana/grafana-enterprise:12.0.1
container_name: grafana container_name: grafana
volumes: volumes:
- ./grafana.json:/etc/grafana/provisioning/dashboards/llm-worker-dashboard.json - ./grafana_dashboards:/etc/grafana/provisioning/dashboards
- ./grafana-dcgm-dashboard.json:/etc/grafana/provisioning/dashboards/dcgm-dashboard.json
- ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml - ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
- ./grafana-dashboard-providers.yml:/etc/grafana/provisioning/dashboards/dashboard-providers.yml
environment: environment:
# Port 3000 is already used by "dynamo serve", so use 3001 # Port 3000 is already used by "dynamo serve", so use 3001
- GF_SERVER_HTTP_PORT=3001 - GF_SERVER_HTTP_PORT=3001
# do not make it admin/admin, because you will be prompted to change the password every time
- GF_SECURITY_ADMIN_USER=dynamo - GF_SECURITY_ADMIN_USER=dynamo
- GF_SECURITY_ADMIN_PASSWORD=dynamo - GF_SECURITY_ADMIN_PASSWORD=dynamo
- GF_USERS_ALLOW_SIGN_UP=false - GF_USERS_ALLOW_SIGN_UP=false
- GF_INSTALL_PLUGINS=grafana-piechart-panel - GF_INSTALL_PLUGINS=grafana-piechart-panel
# Default min interval is 5s, but can be configured lower # Default min interval is 5s, but can be configured lower
- GF_DASHBOARDS_MIN_REFRESH_INTERVAL=2s - GF_DASHBOARDS_MIN_REFRESH_INTERVAL=2s
# Disable password change requirement
- GF_SECURITY_DISABLE_INITIAL_ADMIN_CREATION=false
- GF_SECURITY_ADMIN_PASSWORD_POLICY=false
- GF_AUTH_DISABLE_LOGIN_FORM=false
- GF_AUTH_DISABLE_SIGNOUT_MENU=false
restart: unless-stopped restart: unless-stopped
ports: ports:
- "3001:3001" - "3001:3001"
......
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"copyright": [
"SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.",
"SPDX-License-Identifier: Apache-2.0",
"Licensed under the Apache License, Version 2.0 (the \"License\");",
"you may not use this file except in compliance with the License.",
"You may obtain a copy of the License at",
"http://www.apache.org/licenses/LICENSE-2.0",
"Unless required by applicable law or agreed to in writing, software",
"distributed under the License is distributed on an \"AS IS\" BASIS,",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
"See the License for the specific language governing permissions and",
"limitations under the License."
],
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 1,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "table",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"title": "KV Cache Utilization by Worker",
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "100 * llm_kv_blocks_active{component=\"$component\", endpoint=\"$endpoint\"} / llm_kv_blocks_total{component=\"$component\", endpoint=\"$endpoint\"}",
"legendFormat": "Worker {{worker_id}}",
"range": true,
"refId": "A"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "table",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"title": "Request Slot Utilization by Worker",
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "100 * llm_requests_active_slots{component=\"$component\", endpoint=\"$endpoint\"} / llm_requests_total_slots{component=\"$component\", endpoint=\"$endpoint\"}",
"legendFormat": "Worker {{worker_id}}",
"range": true,
"refId": "A"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 50
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 4,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "10.0.0",
"title": "Average KV Cache Utilization",
"type": "gauge",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "100 * avg(llm_kv_blocks_active{component=\"$component\", endpoint=\"$endpoint\"}) / avg(llm_kv_blocks_total{component=\"$component\", endpoint=\"$endpoint\"})",
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 50
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 4,
"x": 4,
"y": 8
},
"id": 4,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "10.0.0",
"title": "Average Request Slot Utilization",
"type": "gauge",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "100 * avg(llm_requests_active_slots{component=\"$component\", endpoint=\"$endpoint\"}) / avg(llm_requests_total_slots{component=\"$component\", endpoint=\"$endpoint\"})",
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 4,
"x": 8,
"y": 8
},
"id": 7,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "10.0.0",
"title": "Average KV Cache Hit Rate",
"type": "gauge",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "100 * avg(llm_kv_hit_rate_percent{component=\"$component\", endpoint=\"$endpoint\"})",
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 5,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "table",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"title": "Load Average & Standard Deviation",
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "llm_load_avg{component=\"$component\", endpoint=\"$endpoint\"}",
"legendFormat": "Average",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "llm_load_std{component=\"$component\", endpoint=\"$endpoint\"}",
"hide": false,
"legendFormat": "StdDev",
"range": true,
"refId": "B"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 16
},
"id": 8,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "table",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"title": "KV Cache Hit Rate by Worker",
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "100 * llm_kv_hit_rate_percent{component=\"$component\", endpoint=\"$endpoint\"}",
"legendFormat": "Worker {{worker_id}}",
"range": true,
"refId": "A"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 16
},
"id": 9,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "table",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"title": "Average KV Cache Hit Rate",
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "avg(100 * llm_kv_hit_rate_percent{component=\"$component\", endpoint=\"$endpoint\"})",
"legendFormat": "Average Hit Rate",
"range": true,
"refId": "A"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 24
},
"id": 6,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "table",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"title": "Available Resources",
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "sum(llm_kv_blocks_total{component=\"$component\", endpoint=\"$endpoint\"} - llm_kv_blocks_active{component=\"$component\", endpoint=\"$endpoint\"})",
"legendFormat": "Available KV Blocks",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "sum(llm_requests_total_slots{component=\"$component\", endpoint=\"$endpoint\"} - llm_requests_active_slots{component=\"$component\", endpoint=\"$endpoint\"})",
"hide": false,
"legendFormat": "Available Request Slots",
"range": true,
"refId": "B"
}
]
}
],
"refresh": "2s",
"schemaVersion": 38,
"style": "dark",
"tags": [
"llm",
"metrics"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "component",
"value": "vllm"
},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(llm_kv_blocks_active, component)",
"hide": 0,
"includeAll": false,
"label": "Component",
"multi": false,
"name": "component",
"options": [],
"query": {
"query": "label_values(llm_kv_blocks_active, component)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"type": "query"
},
{
"current": {
"selected": false,
"text": "endpoint",
"value": "load_metrics"
},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(llm_kv_blocks_active{component=\"$component\"}, endpoint)",
"hide": 0,
"includeAll": false,
"label": "Endpoint",
"multi": false,
"name": "endpoint",
"options": [],
"query": {
"query": "label_values(llm_kv_blocks_active{component=\"$component\"}, endpoint)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"type": "query"
}
]
},
"time": {
"from": "now-5m",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "LLM Worker Metrics",
"uid": "llm-worker-metrics",
"version": 1,
"weekStart": ""
}
\ No newline at end of file
...@@ -15,57 +15,48 @@ ...@@ -15,57 +15,48 @@
} }
] ]
}, },
"copyright": [ "description": "Various stats, from Dynamo runtime, GPU HW, NATS, etcd, ...",
"SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.",
"SPDX-License-Identifier: Apache-2.0",
"Licensed under the Apache License, Version 2.0 (the \"License\");",
"you may not use this file except in compliance with the License.",
"You may obtain a copy of the License at",
"http://www.apache.org/licenses/LICENSE-2.0",
"Unless required by applicable law or agreed to in writing, software",
"distributed under the License is distributed on an \"AS IS\" BASIS,",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
"See the License for the specific language governing permissions and",
"limitations under the License."
],
"editable": true, "editable": true,
"fiscalYearStartMonth": 0, "fiscalYearStartMonth": 0,
"graphTooltip": 0, "graphTooltip": 0,
"id": 2, "id": 1,
"links": [], "links": [],
"liveNow": false,
"panels": [ "panels": [
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "nv_llm_http_service_requests_total (1m)",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
"mode": "palette-classic" "mode": "palette-classic"
}, },
"custom": { "custom": {
"axisBorderShow": false,
"axisCenteredZero": false, "axisCenteredZero": false,
"axisColorMode": "text", "axisColorMode": "text",
"axisLabel": "", "axisLabel": "Requests",
"axisPlacement": "auto", "axisPlacement": "auto",
"barAlignment": 0, "barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line", "drawStyle": "line",
"fillOpacity": 20, "fillOpacity": 0,
"gradientMode": "none", "gradientMode": "none",
"hideFrom": { "hideFrom": {
"legend": false, "legend": false,
"tooltip": false, "tooltip": false,
"viz": false "viz": false
}, },
"lineInterpolation": "smooth", "insertNulls": false,
"lineWidth": 2, "lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5, "pointSize": 5,
"scaleDistribution": { "scaleDistribution": {
"type": "linear" "type": "linear"
}, },
"showPoints": "never", "showPoints": "auto",
"spanNulls": false, "spanNulls": false,
"stacking": { "stacking": {
"group": "A", "group": "A",
...@@ -80,90 +71,85 @@ ...@@ -80,90 +71,85 @@
"mode": "absolute", "mode": "absolute",
"steps": [ "steps": [
{ {
"color": "green", "color": "green"
"value": null
}, },
{ {
"color": "red", "color": "red",
"value": 80 "value": 80
} }
] ]
}, }
"unit": "percent",
"min": 0,
"max": 100
}, },
"overrides": [] "overrides": []
}, },
"gridPos": { "gridPos": {
"h": 8, "h": 8,
"w": 12, "w": 8,
"x": 0, "x": 0,
"y": 0 "y": 0
}, },
"id": 1, "id": 14,
"options": { "options": {
"legend": { "legend": {
"calcs": [ "calcs": [],
"mean", "displayMode": "list",
"max" "placement": "bottom",
],
"displayMode": "table",
"placement": "right",
"showLegend": true "showLegend": true
}, },
"tooltip": { "tooltip": {
"mode": "multi", "hideZeros": false,
"mode": "single",
"sort": "none" "sort": "none"
} }
}, },
"title": "GPU Utilization", "pluginVersion": "12.0.1",
"type": "timeseries",
"targets": [ "targets": [
{ {
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code", "editorMode": "code",
"expr": "DCGM_FI_DEV_GPU_UTIL", "expr": "rate(nv_llm_http_service_requests_total[30s])",
"legendFormat": "GPU {{gpu}} ({{modelName}})", "legendFormat": "{{request_type}}, {{status}},",
"range": true, "range": true,
"refId": "A" "refId": "A"
} }
] ],
"title": "Requests / Sec",
"type": "timeseries"
}, },
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "nv_llm_http_service_time_to_first_token_seconds (sum/count)",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
"mode": "palette-classic" "mode": "palette-classic"
}, },
"custom": { "custom": {
"axisBorderShow": false,
"axisCenteredZero": false, "axisCenteredZero": false,
"axisColorMode": "text", "axisColorMode": "text",
"axisLabel": "", "axisLabel": "milliseconds",
"axisPlacement": "auto", "axisPlacement": "auto",
"barAlignment": 0, "barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line", "drawStyle": "line",
"fillOpacity": 20, "fillOpacity": 0,
"gradientMode": "none", "gradientMode": "none",
"hideFrom": { "hideFrom": {
"legend": false, "legend": false,
"tooltip": false, "tooltip": false,
"viz": false "viz": false
}, },
"insertNulls": false,
"lineInterpolation": "smooth", "lineInterpolation": "smooth",
"lineWidth": 2, "lineWidth": 1,
"pointSize": 5, "pointSize": 5,
"scaleDistribution": { "scaleDistribution": {
"type": "linear" "type": "linear"
}, },
"showPoints": "never", "showPoints": "auto",
"spanNulls": false, "spanNulls": false,
"stacking": { "stacking": {
"group": "A", "group": "A",
...@@ -178,100 +164,85 @@ ...@@ -178,100 +164,85 @@
"mode": "absolute", "mode": "absolute",
"steps": [ "steps": [
{ {
"color": "green", "color": "green"
"value": null
}, },
{ {
"color": "red", "color": "red",
"value": 80 "value": 80
} }
] ]
}, }
"unit": "bytes",
"min": 0
}, },
"overrides": [] "overrides": []
}, },
"gridPos": { "gridPos": {
"h": 8, "h": 8,
"w": 12, "w": 8,
"x": 12, "x": 8,
"y": 0 "y": 0
}, },
"id": 2, "id": 12,
"options": { "options": {
"legend": { "legend": {
"calcs": [ "calcs": [],
"mean", "displayMode": "list",
"max" "placement": "bottom",
],
"displayMode": "table",
"placement": "right",
"showLegend": true "showLegend": true
}, },
"tooltip": { "tooltip": {
"mode": "multi", "hideZeros": false,
"mode": "single",
"sort": "none" "sort": "none"
} }
}, },
"title": "GPU Memory Usage", "pluginVersion": "12.0.1",
"type": "timeseries",
"targets": [ "targets": [
{ {
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code", "editorMode": "code",
"expr": "DCGM_FI_DEV_FB_USED * 1024 * 1024", "expr": "1000*(nv_llm_http_service_time_to_first_token_seconds_sum/nv_llm_http_service_time_to_first_token_seconds_count)",
"legendFormat": "GPU {{gpu}} Used", "legendFormat": "{{model}}",
"range": true, "range": true,
"refId": "A" "refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "DCGM_FI_DEV_FB_FREE * 1024 * 1024",
"legendFormat": "GPU {{gpu}} Free",
"range": true,
"refId": "B"
} }
] ],
"title": "Avg Time to First Token",
"type": "timeseries"
}, },
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "nv_llm_http_service_inter_token_latency_seconds (sum/count)",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
"mode": "palette-classic" "mode": "palette-classic"
}, },
"custom": { "custom": {
"axisBorderShow": false,
"axisCenteredZero": false, "axisCenteredZero": false,
"axisColorMode": "text", "axisColorMode": "text",
"axisLabel": "", "axisLabel": "milliseconds",
"axisPlacement": "auto", "axisPlacement": "auto",
"barAlignment": 0, "barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line", "drawStyle": "line",
"fillOpacity": 20, "fillOpacity": 0,
"gradientMode": "none", "gradientMode": "none",
"hideFrom": { "hideFrom": {
"legend": false, "legend": false,
"tooltip": false, "tooltip": false,
"viz": false "viz": false
}, },
"lineInterpolation": "smooth", "insertNulls": false,
"lineWidth": 2, "lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5, "pointSize": 5,
"scaleDistribution": { "scaleDistribution": {
"type": "linear" "type": "linear"
}, },
"showPoints": "never", "showPoints": "auto",
"spanNulls": false, "spanNulls": false,
"stacking": { "stacking": {
"group": "A", "group": "A",
...@@ -286,103 +257,85 @@ ...@@ -286,103 +257,85 @@
"mode": "absolute", "mode": "absolute",
"steps": [ "steps": [
{ {
"color": "green", "color": "green"
"value": null
},
{
"color": "yellow",
"value": 70
}, },
{ {
"color": "red", "color": "red",
"value": 85 "value": 80
} }
] ]
}, }
"unit": "celsius"
}, },
"overrides": [] "overrides": []
}, },
"gridPos": { "gridPos": {
"h": 8, "h": 8,
"w": 12, "w": 8,
"x": 0, "x": 16,
"y": 8 "y": 0
}, },
"id": 3, "id": 16,
"options": { "options": {
"legend": { "legend": {
"calcs": [ "calcs": [],
"mean", "displayMode": "list",
"max" "placement": "bottom",
],
"displayMode": "table",
"placement": "right",
"showLegend": true "showLegend": true
}, },
"tooltip": { "tooltip": {
"mode": "multi", "hideZeros": false,
"mode": "single",
"sort": "none" "sort": "none"
} }
}, },
"title": "GPU Temperature", "pluginVersion": "12.0.1",
"type": "timeseries",
"targets": [ "targets": [
{ {
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code", "editorMode": "code",
"expr": "DCGM_FI_DEV_GPU_TEMP", "expr": "1000*(nv_llm_http_service_inter_token_latency_seconds_sum/nv_llm_http_service_inter_token_latency_seconds_count)",
"legendFormat": "GPU {{gpu}} Temp", "legendFormat": "{{model}}",
"range": true, "range": true,
"refId": "A" "refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "DCGM_FI_DEV_MEMORY_TEMP",
"legendFormat": "GPU {{gpu}} Memory Temp",
"range": true,
"refId": "B"
} }
] ],
"title": "Avg Inter-Token Latency",
"type": "timeseries"
}, },
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "nv_llm_http_service_request_duration (sum/count)",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
"mode": "palette-classic" "mode": "palette-classic"
}, },
"custom": { "custom": {
"axisBorderShow": false,
"axisCenteredZero": false, "axisCenteredZero": false,
"axisColorMode": "text", "axisColorMode": "text",
"axisLabel": "", "axisLabel": "milliseconds",
"axisPlacement": "auto", "axisPlacement": "auto",
"barAlignment": 0, "barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line", "drawStyle": "line",
"fillOpacity": 20, "fillOpacity": 0,
"gradientMode": "none", "gradientMode": "none",
"hideFrom": { "hideFrom": {
"legend": false, "legend": false,
"tooltip": false, "tooltip": false,
"viz": false "viz": false
}, },
"lineInterpolation": "smooth", "insertNulls": false,
"lineWidth": 2, "lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5, "pointSize": 5,
"scaleDistribution": { "scaleDistribution": {
"type": "linear" "type": "linear"
}, },
"showPoints": "never", "showPoints": "auto",
"spanNulls": false, "spanNulls": false,
"stacking": { "stacking": {
"group": "A", "group": "A",
...@@ -397,187 +350,85 @@ ...@@ -397,187 +350,85 @@
"mode": "absolute", "mode": "absolute",
"steps": [ "steps": [
{ {
"color": "green", "color": "green"
"value": null
}
]
},
"unit": "watt"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 4,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "table",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
}, },
"title": "GPU Power Usage",
"type": "timeseries",
"targets": [
{ {
"datasource": { "color": "red",
"type": "prometheus", "value": 80
"uid": "prometheus"
},
"editorMode": "code",
"expr": "DCGM_FI_DEV_POWER_USAGE",
"legendFormat": "GPU {{gpu}} Power",
"range": true,
"refId": "A"
} }
] ]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
} }
}, },
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "hertz"
},
"overrides": [] "overrides": []
}, },
"gridPos": { "gridPos": {
"h": 8, "h": 8,
"w": 12, "w": 8,
"x": 0, "x": 0,
"y": 16 "y": 8
}, },
"id": 5, "id": 17,
"options": { "options": {
"legend": { "legend": {
"calcs": [ "calcs": [],
"mean", "displayMode": "list",
"max" "placement": "bottom",
],
"displayMode": "table",
"placement": "right",
"showLegend": true "showLegend": true
}, },
"tooltip": { "tooltip": {
"mode": "multi", "hideZeros": false,
"mode": "single",
"sort": "none" "sort": "none"
} }
}, },
"title": "GPU Clock Speeds", "pluginVersion": "12.0.1",
"type": "timeseries",
"targets": [ "targets": [
{ {
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code", "editorMode": "code",
"expr": "DCGM_FI_DEV_SM_CLOCK * 1000000", "expr": "1000*(nv_llm_http_service_request_duration_seconds_sum / nv_llm_http_service_request_duration_seconds_count)",
"legendFormat": "GPU {{gpu}} SM Clock", "legendFormat": "{{model}}",
"range": true, "range": true,
"refId": "A" "refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "DCGM_FI_DEV_MEM_CLOCK * 1000000",
"legendFormat": "GPU {{gpu}} Memory Clock",
"range": true,
"refId": "B"
} }
] ],
"title": "Avg Request Duration",
"type": "timeseries"
}, },
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "The length is the number of tokens. nv_llm_http_service_input_sequence_tokens",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
"mode": "palette-classic" "mode": "palette-classic"
}, },
"custom": { "custom": {
"axisBorderShow": false,
"axisCenteredZero": false, "axisCenteredZero": false,
"axisColorMode": "text", "axisColorMode": "text",
"axisLabel": "", "axisLabel": "Tokens",
"axisPlacement": "auto", "axisPlacement": "auto",
"barAlignment": 0, "barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line", "drawStyle": "line",
"fillOpacity": 20, "fillOpacity": 0,
"gradientMode": "none", "gradientMode": "none",
"hideFrom": { "hideFrom": {
"legend": false, "legend": false,
"tooltip": false, "tooltip": false,
"viz": false "viz": false
}, },
"lineInterpolation": "smooth", "insertNulls": false,
"lineWidth": 2, "lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5, "pointSize": 5,
"scaleDistribution": { "scaleDistribution": {
"type": "linear" "type": "linear"
}, },
"showPoints": "never", "showPoints": "auto",
"spanNulls": false, "spanNulls": false,
"stacking": { "stacking": {
"group": "A", "group": "A",
...@@ -592,70 +443,67 @@ ...@@ -592,70 +443,67 @@
"mode": "absolute", "mode": "absolute",
"steps": [ "steps": [
{ {
"color": "green", "color": "green"
"value": null },
{
"color": "red",
"value": 80
} }
] ]
}, }
"unit": "percent",
"min": 0,
"max": 100
}, },
"overrides": [] "overrides": []
}, },
"gridPos": { "gridPos": {
"h": 8, "h": 8,
"w": 12, "w": 8,
"x": 12, "x": 8,
"y": 16 "y": 8
}, },
"id": 6, "id": 11,
"options": { "options": {
"legend": { "legend": {
"calcs": [ "calcs": [],
"mean", "displayMode": "list",
"max" "placement": "bottom",
],
"displayMode": "table",
"placement": "right",
"showLegend": true "showLegend": true
}, },
"tooltip": { "tooltip": {
"mode": "multi", "hideZeros": false,
"mode": "single",
"sort": "none" "sort": "none"
} }
}, },
"title": "GPU Engine Activity", "pluginVersion": "12.0.1",
"type": "timeseries",
"targets": [ "targets": [
{ {
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code", "editorMode": "code",
"expr": "DCGM_FI_PROF_GR_ENGINE_ACTIVE * 100", "expr": "nv_llm_http_service_input_sequence_tokens_sum / nv_llm_http_service_input_sequence_tokens_count",
"legendFormat": "GPU {{gpu}} Graphics Engine", "legendFormat": "ISL",
"range": true, "range": true,
"refId": "A" "refId": "A"
}, },
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "DCGM_FI_PROF_PIPE_TENSOR_ACTIVE * 100", "expr": "nv_llm_http_service_output_sequence_tokens_sum / nv_llm_http_service_output_sequence_tokens_count",
"legendFormat": "GPU {{gpu}} Tensor Core", "hide": false,
"instant": false,
"legendFormat": "OSL",
"range": true, "range": true,
"refId": "B" "refId": "B"
} }
] ],
"title": "Avg Input/Output Sequence Length",
"type": "timeseries"
}, },
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
...@@ -663,26 +511,29 @@ ...@@ -663,26 +511,29 @@
"mode": "palette-classic" "mode": "palette-classic"
}, },
"custom": { "custom": {
"axisBorderShow": false,
"axisCenteredZero": false, "axisCenteredZero": false,
"axisColorMode": "text", "axisColorMode": "text",
"axisLabel": "", "axisLabel": "",
"axisPlacement": "auto", "axisPlacement": "left",
"barAlignment": 0, "barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line", "drawStyle": "line",
"fillOpacity": 20, "fillOpacity": 0,
"gradientMode": "none", "gradientMode": "none",
"hideFrom": { "hideFrom": {
"legend": false, "legend": false,
"tooltip": false, "tooltip": false,
"viz": false "viz": false
}, },
"insertNulls": false,
"lineInterpolation": "smooth", "lineInterpolation": "smooth",
"lineWidth": 2, "lineWidth": 1,
"pointSize": 5, "pointSize": 5,
"scaleDistribution": { "scaleDistribution": {
"type": "linear" "type": "linear"
}, },
"showPoints": "never", "showPoints": "auto",
"spanNulls": false, "spanNulls": false,
"stacking": { "stacking": {
"group": "A", "group": "A",
...@@ -697,208 +548,79 @@ ...@@ -697,208 +548,79 @@
"mode": "absolute", "mode": "absolute",
"steps": [ "steps": [
{ {
"color": "green", "color": "green"
"value": null },
{
"color": "red",
"value": 79.9954
} }
] ]
}, }
"unit": "binBps"
}, },
"overrides": [] "overrides": []
}, },
"gridPos": { "gridPos": {
"h": 8, "h": 8,
"w": 12, "w": 8,
"x": 0, "x": 16,
"y": 24 "y": 8
}, },
"id": 7, "id": 1,
"options": { "options": {
"legend": { "legend": {
"calcs": [ "calcs": [],
"mean", "displayMode": "list",
"max" "placement": "bottom",
],
"displayMode": "table",
"placement": "right",
"showLegend": true "showLegend": true
}, },
"tooltip": { "tooltip": {
"mode": "multi", "hideZeros": false,
"mode": "single",
"sort": "none" "sort": "none"
} }
}, },
"title": "PCIe Bandwidth", "pluginVersion": "12.0.1",
"type": "timeseries",
"targets": [ "targets": [
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "rate(DCGM_FI_PROF_PCIE_RX_BYTES[10s])", "exemplar": false,
"legendFormat": "GPU {{gpu}} PCIe RX", "expr": "DCGM_FI_DEV_GPU_UTIL",
"instant": false,
"legendFormat": "{{__name__}} (%)",
"range": true, "range": true,
"refId": "A" "refId": "A"
}, },
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
"uid": "prometheus" "uid": "P1809F7CD0C75ACF3"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "rate(DCGM_FI_PROF_PCIE_TX_BYTES[10s])", "exemplar": false,
"legendFormat": "GPU {{gpu}} PCIe TX", "expr": "DCGM_FI_DEV_POWER_USAGE",
"hide": false,
"instant": false,
"legendFormat": "{{__name__}} (Watts)",
"range": true, "range": true,
"refId": "B" "refId": "B"
} }
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 50
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 12,
"y": 24
},
"id": 8,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "10.0.0",
"title": "Average GPU Utilization",
"type": "gauge",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "avg(DCGM_FI_DEV_GPU_UTIL)",
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 70
},
{
"color": "red",
"value": 85
}
]
},
"unit": "celsius"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 18,
"y": 24
},
"id": 9,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
], ],
"fields": "", "title": "DCGM GPU Utilization",
"values": false "type": "timeseries"
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "10.0.0",
"title": "Max GPU Temperature",
"type": "gauge",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "max(DCGM_FI_DEV_GPU_TEMP)",
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
]
} }
], ],
"refresh": "5s", "preload": false,
"schemaVersion": 36, "refresh": "",
"style": "dark", "schemaVersion": 41,
"tags": [ "tags": [
"dcgm", "Dynamo",
"gpu", "DCGM",
"nvidia" "etcd",
"NATS"
], ],
"templating": { "templating": {
"list": [] "list": []
...@@ -908,9 +630,8 @@ ...@@ -908,9 +630,8 @@
"to": "now" "to": "now"
}, },
"timepicker": {}, "timepicker": {},
"timezone": "", "timezone": "browser",
"title": "DCGM GPU Monitoring Dashboard", "title": "Dynamo Dashboard",
"uid": "dcgm-dashboard", "uid": "a7d3733f-f8e7-423a-ab4b-b18e3d7d0357",
"version": 1, "version": 5
"weekStart": ""
} }
\ No newline at end of file
...@@ -33,14 +33,23 @@ scrape_configs: ...@@ -33,14 +33,23 @@ scrape_configs:
static_configs: static_configs:
- targets: ['dcgm-exporter:9400'] # on the "monitoring" network - targets: ['dcgm-exporter:9400'] # on the "monitoring" network
# This is a demo service that needs to be launched manually. See components/metrics/README.md
# Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp
- job_name: 'llm-demo'
scrape_interval: 10s
static_configs:
- targets: ['host.docker.internal:8000'] # on the "monitoring" network
# This is another demo aggregator that needs to be launched manually. See components/metrics/README.md
# Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 9091/tcp
- job_name: 'metrics-aggregation-service'
scrape_interval: 2s
static_configs:
# - targets: ['localhost:9091'] # metrics aggregation service on host
- targets: ['host.docker.internal:9091'] # metrics aggregation service on host
# Uncomment to see its own Prometheus metrics # Uncomment to see its own Prometheus metrics
# - job_name: 'prometheus' # - job_name: 'prometheus'
# scrape_interval: 5s # scrape_interval: 5s
# static_configs: # static_configs:
# - targets: ['prometheus:9090'] # on the "monitoring" network # - targets: ['prometheus:9090'] # on the "monitoring" network
# Uncomment to see the metrics-aggregation-service metrics
# - job_name: 'metrics-aggregation-service'
# scrape_interval: 2s
# static_configs:
# - targets: ['host.docker.internal:9091'] # metrics aggregation service on host
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment