Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
ac13ed06
Commit
ac13ed06
authored
Mar 06, 2025
by
Neelay Shah
Committed by
GitHub
Mar 06, 2025
Browse files
refactor: rename count to metrics and move location (#21)
parent
1b96c2c4
Changes
15
Hide whitespace changes
Inline
Side-by-side
Showing
15 changed files
with
102 additions
and
109 deletions
+102
-109
.github/workflows/pre-merge-rust.yml
.github/workflows/pre-merge-rust.yml
+1
-1
applications/llm/count/visualization/docker-compose.yml
applications/llm/count/visualization/docker-compose.yml
+0
-63
components/metrics/Cargo.lock
components/metrics/Cargo.lock
+21
-21
components/metrics/Cargo.toml
components/metrics/Cargo.toml
+4
-3
components/metrics/README.md
components/metrics/README.md
+9
-9
components/metrics/src/bin/mock_worker.rs
components/metrics/src/bin/mock_worker.rs
+0
-0
components/metrics/src/lib.rs
components/metrics/src/lib.rs
+1
-1
components/metrics/src/main.rs
components/metrics/src/main.rs
+3
-3
deploy/docker-compose.yml
deploy/docker-compose.yml
+47
-0
deploy/metrics/README.md
deploy/metrics/README.md
+15
-7
deploy/metrics/grafana-dashboard-providers.yml
deploy/metrics/grafana-dashboard-providers.yml
+0
-0
deploy/metrics/grafana-datasources.yml
deploy/metrics/grafana-datasources.yml
+0
-0
deploy/metrics/grafana.json
deploy/metrics/grafana.json
+0
-0
deploy/metrics/prometheus.yml
deploy/metrics/prometheus.yml
+0
-0
dynemo.code-workspace
dynemo.code-workspace
+1
-1
No files found.
.github/workflows/pre-merge-rust.yml
View file @
ac13ed06
...
@@ -40,7 +40,7 @@ jobs:
...
@@ -40,7 +40,7 @@ jobs:
pre-merge-rust
:
pre-merge-rust
:
runs-on
:
ubuntu-latest
runs-on
:
ubuntu-latest
strategy
:
strategy
:
matrix
:
{
dir
:
[
'
lib/runtime'
,
'
lib/llm'
,
'
lib/bindings/c'
,
'
lib/bindings/python'
,
'
launch/dynemo-run'
,
'
applications/llm/count
'
,
'
examples/rust'
]
}
matrix
:
{
dir
:
[
'
lib/runtime'
,
'
lib/llm'
,
'
lib/bindings/c'
,
'
lib/bindings/python'
,
'
launch/dynemo-run'
,
'
components/metrics
'
,
'
examples/rust'
]
}
permissions
:
permissions
:
contents
:
read
contents
:
read
steps
:
steps
:
...
...
applications/llm/count/visualization/docker-compose.yml
deleted
100644 → 0
View file @
1b96c2c4
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
services
:
prometheus
:
image
:
prom/prometheus:latest
container_name
:
prometheus
volumes
:
-
./prometheus.yml:/etc/prometheus/prometheus.yml
command
:
-
'
--config.file=/etc/prometheus/prometheus.yml'
-
'
--storage.tsdb.path=/prometheus'
# These provide the web console functionality
-
'
--web.console.libraries=/etc/prometheus/console_libraries'
-
'
--web.console.templates=/etc/prometheus/consoles'
-
'
--web.enable-lifecycle'
restart
:
unless-stopped
# TODO: Use more explicit networking setup when count is containerized
#ports:
# - "9090:9090"
#networks:
# - monitoring
network_mode
:
"
host"
grafana
:
image
:
grafana/grafana-enterprise:latest
container_name
:
grafana
volumes
:
-
./grafana.json:/etc/grafana/provisioning/dashboards/llm-worker-dashboard.json
-
./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
-
./grafana-dashboard-providers.yml:/etc/grafana/provisioning/dashboards/dashboard-providers.yml
environment
:
-
GF_SECURITY_ADMIN_USER=admin
-
GF_SECURITY_ADMIN_PASSWORD=admin
-
GF_USERS_ALLOW_SIGN_UP=false
-
GF_INSTALL_PLUGINS=grafana-piechart-panel
# Default min interval is 5s, but can be configured lower
-
GF_DASHBOARDS_MIN_REFRESH_INTERVAL=2s
restart
:
unless-stopped
# TODO: Use more explicit networking setup when count is containerized
#ports:
# - "3000:3000"
#networks:
# - monitoring
network_mode
:
"
host"
depends_on
:
-
prometheus
networks
:
monitoring
:
driver
:
bridge
applications/llm/count
/Cargo.lock
→
components/metrics
/Cargo.lock
View file @
ac13ed06
...
@@ -731,27 +731,6 @@ version = "0.8.7"
...
@@ -731,27 +731,6 @@ version = "0.8.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "773648b94d0e5d620f64f280777445740e61fe701025087ec8b57f45c791888b"
checksum = "773648b94d0e5d620f64f280777445740e61fe701025087ec8b57f45c791888b"
[[package]]
name = "count"
version = "0.1.0"
dependencies = [
"axum 0.6.20",
"clap",
"dynemo-llm",
"dynemo-runtime",
"futures",
"opentelemetry",
"opentelemetry-prometheus",
"prometheus",
"rand",
"reqwest 0.11.27",
"serde",
"serde_json",
"thiserror 1.0.69",
"tokio",
"tracing",
]
[[package]]
[[package]]
name = "cpufeatures"
name = "cpufeatures"
version = "0.2.17"
version = "0.2.17"
...
@@ -2216,6 +2195,27 @@ dependencies = [
...
@@ -2216,6 +2195,27 @@ dependencies = [
"autocfg",
"autocfg",
]
]
[[package]]
name = "metrics"
version = "0.1.0"
dependencies = [
"axum 0.6.20",
"clap",
"dynemo-llm",
"dynemo-runtime",
"futures",
"opentelemetry",
"opentelemetry-prometheus",
"prometheus",
"rand",
"reqwest 0.11.27",
"serde",
"serde_json",
"thiserror 1.0.69",
"tokio",
"tracing",
]
[[package]]
[[package]]
name = "mime"
name = "mime"
version = "0.3.17"
version = "0.3.17"
...
...
applications/llm/count
/Cargo.toml
→
components/metrics
/Cargo.toml
View file @
ac13ed06
...
@@ -14,15 +14,16 @@
...
@@ -14,15 +14,16 @@
# limitations under the License.
# limitations under the License.
[package]
[package]
name
=
"
count
"
name
=
"
metrics
"
version
=
"0.1.0"
version
=
"0.1.0"
edition
=
"2021"
edition
=
"2021"
license
=
"Apache-2.0"
license
=
"Apache-2.0"
[dependencies]
[dependencies]
# local
# local
dynemo-runtime
=
{
path
=
"../../../lib/runtime"
}
dynemo-llm
=
{
path
=
"../../../lib/llm"
}
dynemo-runtime
=
{
path
=
"../../lib/runtime"
}
dynemo-llm
=
{
path
=
"../../lib/llm"
}
# workspace - todo
# workspace - todo
...
...
applications/llm/count
/README.md
→
components/metrics
/README.md
View file @
ac13ed06
#
Count
#
Metrics
## Quickstart
## Quickstart
To start
`
count
`
, simply point it at the namespace/component/endpoint trio that
To start
`
metrics
`
, simply point it at the namespace/component/endpoint trio that
you're interested in observing metrics from. This will scrape statistics from
you're interested in observing metrics from. This will scrape statistics from
the services associated with that endpoint, do some postprocessing on them,
the services associated with that endpoint, do some postprocessing on them,
and then publish an event with the postprocessed data.
and then publish an event with the postprocessed data.
```
bash
```
bash
# For more details, try DYN_LOG=debug
# For more details, try DYN_LOG=debug
DYN_LOG
=
info cargo run
--bin
count
--
--namespace
dynemo
--component
backend
--endpoint
generate
DYN_LOG
=
info cargo run
--bin
metrics
--
--namespace
dynemo
--component
backend
--endpoint
generate
# 2025-02-26T18:45:05.467026Z INFO
count
: Creating unique instance of
Count
at dynemo/components/
count
/instance
# 2025-02-26T18:45:05.467026Z INFO
metrics
: Creating unique instance of
Metrics
at dynemo/components/
metrics
/instance
# 2025-02-26T18:45:05.472146Z INFO
count
: Scraping service dynemo_
init_
backend_720278f8 and filtering on subject dynemo_
init_
backend_720278f8.generate
# 2025-02-26T18:45:05.472146Z INFO
metrics
: Scraping service dynemo_backend_720278f8 and filtering on subject dynemo_backend_720278f8.generate
# ...
# ...
```
```
With no matching endpoints running, you should see warnings in the logs:
With no matching endpoints running, you should see warnings in the logs:
```
bash
```
bash
2025-02-26T18:45:06.474161Z WARN
count
: No endpoints found matching subject dynemo_
init_
backend_720278f8.generate
2025-02-26T18:45:06.474161Z WARN
metrics
: No endpoints found matching subject dynemo_backend_720278f8.generate
```
```
To see metrics published to a matching endpoint, you can use the
To see metrics published to a matching endpoint, you can use the
...
@@ -32,10 +32,10 @@ cargo run --bin mock_worker
...
@@ -32,10 +32,10 @@ cargo run --bin mock_worker
After a matching endpoint gets started, you should see the warnings go away
After a matching endpoint gets started, you should see the warnings go away
since the endpoint will automatically get discovered.
since the endpoint will automatically get discovered.
When stats are found from
the
target endpoints
being listened on, cou
nt will
When stats are found from target endpoints
, the metrics compone
nt will
aggregate and publish
some
metrics as both
an
event and to a prometheus
web
server:
aggregate and publish metrics as both event
s
and
as updates
to a prometheus server:
```
```
2025-02-28T04:05:58.077901Z INFO
count
: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dynemo_
init_
backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dynemo_
init_
backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
2025-02-28T04:05:58.077901Z INFO
metrics
: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
```
```
To see the metrics being published in prometheus format, you can run:
To see the metrics being published in prometheus format, you can run:
...
...
applications/llm/count
/src/bin/mock_worker.rs
→
components/metrics
/src/bin/mock_worker.rs
View file @
ac13ed06
File moved
applications/llm/count
/src/lib.rs
→
components/metrics
/src/lib.rs
View file @
ac13ed06
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
//! Library functions for the
count
application.
//! Library functions for the
metrics
application.
use
axum
::{
routing
::
get
,
Router
};
use
axum
::{
routing
::
get
,
Router
};
use
prometheus
::{
register_counter_vec
,
register_gauge_vec
};
use
prometheus
::{
register_counter_vec
,
register_gauge_vec
};
...
...
applications/llm/count
/src/main.rs
→
components/metrics
/src/main.rs
View file @
ac13ed06
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
//!
Count
is a metrics aggregator designed to operate within a namespace and collect
//!
Metrics
is a metrics aggregator designed to operate within a namespace and collect
//! metrics from all workers.
//! metrics from all workers.
//!
//!
//! Metrics will collect for now:
//! Metrics will collect for now:
...
@@ -38,12 +38,12 @@ use futures::stream::StreamExt;
...
@@ -38,12 +38,12 @@ use futures::stream::StreamExt;
use
std
::
sync
::
Arc
;
use
std
::
sync
::
Arc
;
// Import from our library
// Import from our library
use
count
::{
use
metrics
::{
collect_endpoints
,
extract_metrics
,
postprocess_metrics
,
LLMWorkerLoadCapacityConfig
,
collect_endpoints
,
extract_metrics
,
postprocess_metrics
,
LLMWorkerLoadCapacityConfig
,
PrometheusMetricsServer
,
PrometheusMetricsServer
,
};
};
/// CLI arguments for the
count
application
/// CLI arguments for the
metrics
application
#[derive(Parser,
Debug)]
#[derive(Parser,
Debug)]
#[command(author,
version,
about,
long_about
=
None)]
#[command(author,
version,
about,
long_about
=
None)]
struct
Args
{
struct
Args
{
...
...
deploy/docker-compose.yml
View file @
ac13ed06
...
@@ -29,3 +29,50 @@ services:
...
@@ -29,3 +29,50 @@ services:
ports
:
ports
:
-
2379:2379
-
2379:2379
-
2380:2380
-
2380:2380
prometheus
:
image
:
prom/prometheus:latest
container_name
:
prometheus
volumes
:
-
./metrics/prometheus.yml:/etc/prometheus/prometheus.yml
command
:
-
'
--config.file=/etc/prometheus/prometheus.yml'
-
'
--storage.tsdb.path=/prometheus'
# These provide the web console functionality
-
'
--web.console.libraries=/etc/prometheus/console_libraries'
-
'
--web.console.templates=/etc/prometheus/consoles'
-
'
--web.enable-lifecycle'
restart
:
unless-stopped
# TODO: Use more explicit networking setup when metrics is containerized
#ports:
# - "9090:9090"
#networks:
# - monitoring
network_mode
:
"
host"
profiles
:
[
metrics
]
grafana
:
image
:
grafana/grafana-enterprise:latest
container_name
:
grafana
volumes
:
-
./metrics/grafana.json:/etc/grafana/provisioning/dashboards/llm-worker-dashboard.json
-
./metrics/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
-
./metrics/grafana-dashboard-providers.yml:/etc/grafana/provisioning/dashboards/dashboard-providers.yml
environment
:
-
GF_SECURITY_ADMIN_USER=admin
-
GF_SECURITY_ADMIN_PASSWORD=admin
-
GF_USERS_ALLOW_SIGN_UP=false
-
GF_INSTALL_PLUGINS=grafana-piechart-panel
# Default min interval is 5s, but can be configured lower
-
GF_DASHBOARDS_MIN_REFRESH_INTERVAL=2s
restart
:
unless-stopped
# TODO: Use more explicit networking setup when metrics is containerized
#ports:
# - "3000:3000"
#networks:
# - monitoring
network_mode
:
"
host"
profiles
:
[
metrics
]
depends_on
:
-
prometheus
applications/llm/count/visualization
/README.md
→
deploy/metrics
/README.md
View file @
ac13ed06
...
@@ -11,17 +11,23 @@ This directory contains configuration for visualizing metrics from the metrics a
...
@@ -11,17 +11,23 @@ This directory contains configuration for visualizing metrics from the metrics a
1.
Make sure Docker and Docker Compose are installed on your system
1.
Make sure Docker and Docker Compose are installed on your system
2.
Start
`count`
and the corresponding
`examples/rust/service_metrics/bin/server.rs`
that populates dummy KV Cache metrics.
2.
Start the
`components/metrics`
application to begin monitoring for metric events from dynemo workers
and aggregating them on a prometheus metrics endpoint:
`http://localhost:9091/metrics`
.
3.
Start the visualization stack:
3.
Start worker(s) that publishes KV Cache metrics.
-
For quick testing,
`examples/rust/service_metrics/bin/server.rs`
can populate dummy KV Cache metrics.
-
For a real workflow with real data, see the KV Routing example in
`examples/python_rs/llm/vllm`
.
4.
Start the visualization stack:
```
bash
```
bash
docker compose up
-d
docker compose
--profile
metrics
up
-d
```
```
4.
Web servers started:
5.
Web servers started:
-
Grafana: http://localhost:3000 (default login: admin/admin)
-
Grafana:
`http://localhost:3000`
(default login: admin/admin) (started by docker compose)
-
Prometheus: http://localhost:9090
-
Prometheus Server:
`http://localhost:9090`
(started by docker compose)
-
Prometheus Metrics Endpoint:
`http://localhost:9091/metrics`
(started by
`components/metrics`
application)
## Configuration
## Configuration
...
@@ -40,7 +46,7 @@ Grafana is pre-configured with:
...
@@ -40,7 +46,7 @@ Grafana is pre-configured with:
## Required Files
## Required Files
The following configuration files should be present in this directory:
The following configuration files should be present in this directory:
-
`docker-compose.yml`
: Defines the Prometheus and Grafana services
-
`
..\
docker-compose.yml`
: Defines the Prometheus and Grafana services
-
`prometheus.yml`
: Contains Prometheus scraping configuration
-
`prometheus.yml`
: Contains Prometheus scraping configuration
-
`grafana.json`
: Contains Grafana dashboard configuration
-
`grafana.json`
: Contains Grafana dashboard configuration
-
`grafana-datasources.yml`
: Contains Grafana datasource configuration
-
`grafana-datasources.yml`
: Contains Grafana datasource configuration
...
@@ -55,6 +61,8 @@ The prometheus service exposes the following metrics:
...
@@ -55,6 +61,8 @@ The prometheus service exposes the following metrics:
-
`llm_requests_total_slots`
: Total available request slots
-
`llm_requests_total_slots`
: Total available request slots
-
`llm_kv_blocks_active`
: Number of active KV blocks
-
`llm_kv_blocks_active`
: Number of active KV blocks
-
`llm_kv_blocks_total`
: Total KV blocks available
-
`llm_kv_blocks_total`
: Total KV blocks available
-
`llm_kv_hit_rate_isl_blocks`
: Cumulative count of ISL blocks in KV hit rate events
-
`llm_kv_hit_rate_overlap_blocks`
: Cumulative count of overlapping blocks in KV hit rate events
## Troubleshooting
## Troubleshooting
...
...
applications/llm/count/visualization
/grafana-dashboard-providers.yml
→
deploy/metrics
/grafana-dashboard-providers.yml
View file @
ac13ed06
File moved
applications/llm/count/visualization
/grafana-datasources.yml
→
deploy/metrics
/grafana-datasources.yml
View file @
ac13ed06
File moved
applications/llm/count/visualization
/grafana.json
→
deploy/metrics
/grafana.json
View file @
ac13ed06
File moved
applications/llm/count/visualization
/prometheus.yml
→
deploy/metrics
/prometheus.yml
View file @
ac13ed06
File moved
dynemo.code-workspace
View file @
ac13ed06
...
@@ -6,7 +6,7 @@
...
@@ -6,7 +6,7 @@
],
],
"settings": {
"settings": {
"rust-analyzer.linkedProjects": [
"rust-analyzer.linkedProjects": [
"
applications/llm/count
/Cargo.toml",
"
components/metrics
/Cargo.toml",
"lib/llm/Cargo.toml",
"lib/llm/Cargo.toml",
"lib/runtime/Cargo.toml",
"lib/runtime/Cargo.toml",
"lib/bindings/python/Cargo.toml",
"lib/bindings/python/Cargo.toml",
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment