Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
1ce7ba03
Commit
1ce7ba03
authored
Mar 07, 2025
by
Ryan McCormick
Committed by
GitHub
Mar 07, 2025
Browse files
feat: Enhance mock worker with mock KvHitRate events (#50)
parent
9f53922a
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
107 additions
and
40 deletions
+107
-40
components/metrics/Cargo.lock
components/metrics/Cargo.lock
+1
-0
components/metrics/Cargo.toml
components/metrics/Cargo.toml
+1
-0
components/metrics/README.md
components/metrics/README.md
+28
-16
components/metrics/src/bin/mock_worker.rs
components/metrics/src/bin/mock_worker.rs
+77
-24
No files found.
components/metrics/Cargo.lock
View file @
1ce7ba03
...
@@ -2199,6 +2199,7 @@ dependencies = [
...
@@ -2199,6 +2199,7 @@ dependencies = [
name = "metrics"
name = "metrics"
version = "0.1.0"
version = "0.1.0"
dependencies = [
dependencies = [
"async-nats",
"axum 0.6.20",
"axum 0.6.20",
"clap",
"clap",
"dynemo-llm",
"dynemo-llm",
...
...
components/metrics/Cargo.toml
View file @
1ce7ba03
...
@@ -28,6 +28,7 @@ dynemo-llm = { path = "../../lib/llm" }
...
@@ -28,6 +28,7 @@ dynemo-llm = { path = "../../lib/llm" }
# workspace - todo
# workspace - todo
# crates.io
# crates.io
async-nats
=
{
version
=
"0.38"
,
features
=
["service"]
}
clap
=
{
version
=
"4.5"
,
features
=
[
"derive"
,
"env"
]
}
clap
=
{
version
=
"4.5"
,
features
=
[
"derive"
,
"env"
]
}
serde
=
{
version
=
"1"
,
features
=
["derive"]
}
serde
=
{
version
=
"1"
,
features
=
["derive"]
}
serde_json
=
{
version
=
"1"
}
serde_json
=
{
version
=
"1"
}
...
...
components/metrics/README.md
View file @
1ce7ba03
...
@@ -2,11 +2,14 @@
...
@@ -2,11 +2,14 @@
## Quickstart
## Quickstart
To start
`metrics`
, simply point it at the namespace/component/endpoint trio that
To start the
`metrics`
component, simply point it at the
`namespace/component/endpoint`
trio that
you're interested in observing metrics from. This will scrape statistics from
you're interested in observing metrics from.
the services associated with that endpoint, do some postprocessing on them,
and then publish an event with the postprocessed data.
This will:
1.
Scrape statistics from the services associated with that
`endpoint`
, do some postprocessing, and aggregate them.
2.
Listen for
`KvHitRateEvent`
s on
`namespace/kv-hit-rate`
, and aggregate them.
For example:
```
bash
```
bash
# For more details, try DYN_LOG=debug
# For more details, try DYN_LOG=debug
DYN_LOG
=
info cargo run
--bin
metrics
--
--namespace
dynemo
--component
backend
--endpoint
generate
DYN_LOG
=
info cargo run
--bin
metrics
--
--namespace
dynemo
--component
backend
--endpoint
generate
...
@@ -16,24 +19,16 @@ DYN_LOG=info cargo run --bin metrics -- --namespace dynemo --component backend -
...
@@ -16,24 +19,16 @@ DYN_LOG=info cargo run --bin metrics -- --namespace dynemo --component backend -
# ...
# ...
```
```
With no matching endpoints running, you should see warnings in the logs:
With no matching endpoints running
to collect stats from
, you should see warnings in the logs:
```
bash
```
bash
2025-02-26T18:45:06.474161Z WARN metrics: No endpoints found matching subject dynemo_backend_720278f8.generate
2025-02-26T18:45:06.474161Z WARN metrics: No endpoints found matching subject dynemo_backend_720278f8.generate
```
```
To see metrics published to a matching endpoint, you can use the
After a matching endpoint gets started, you should see the warnings stop
[
mock_worker example
](
src/bin/mock_worker.rs
)
in this directory to launch
when the endpoint gets automatically discovered.
1 or more workers that publish LLM Metrics:
```
bash
# Can run multiple workers in separate shells
cargo run
--bin
mock_worker
```
After a matching endpoint gets started, you should see the warnings go away
since the endpoint will automatically get discovered.
When stats are found from target endpoints, the metrics component will
When stats are found from target endpoints, the metrics component will
aggregate and publish
metrics as both events and as updates to a prometheus server
:
aggregate
them
and publish
them to a prometheus server running on
`localhost:9091/metrics`
by default
:
```
```
2025-02-28T04:05:58.077901Z INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
2025-02-28T04:05:58.077901Z INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
```
```
...
@@ -51,3 +46,20 @@ curl localhost:9091/metrics
...
@@ -51,3 +46,20 @@ curl localhost:9091/metrics
# llm_kv_blocks_total{component="backend",endpoint="generate",worker_id="7587884888253033398"} 100
# llm_kv_blocks_total{component="backend",endpoint="generate",worker_id="7587884888253033398"} 100
# llm_kv_blocks_total{component="backend",endpoint="generate",worker_id="7587884888253033401"} 100
# llm_kv_blocks_total{component="backend",endpoint="generate",worker_id="7587884888253033401"} 100
```
```
## Mock Worker
For convenience and debugging, there is a mock worker that registers a mock
`StatsHandler`
with the
`endpoint`
and publishes mock
`KvHitRateEvent`
s on
`namespace/kv-hit-rate`
.
```
bash
# Can run multiple workers in separate shells to see aggregation as well.
DYN_LOG
=
info cargo run
--bin
mock_worker
```
**NOTE**
: When using the mock worker, the data from the stats handler and the
events will be random and shouldn't be expected to correlate with each other.
## Real Worker
See the KV Routing example in
`examples/python_rs/llm/vllm`
.
components/metrics/src/bin/mock_worker.rs
View file @
1ce7ba03
...
@@ -13,18 +13,25 @@
...
@@ -13,18 +13,25 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
use
dynemo_llm
::
kv_router
::
protocols
::
ForwardPassMetrics
;
use
async_nats
::
service
::
endpoint
::
Stats
;
use
dynemo_llm
::
kv_router
::{
protocols
::
ForwardPassMetrics
,
scheduler
::
KVHitRateEvent
,
KV_HIT_RATE_SUBJECT
,
};
use
dynemo_runtime
::{
use
dynemo_runtime
::{
component
::
Namespace
,
logging
,
logging
,
pipeline
::{
pipeline
::{
async_trait
,
network
::
Ingress
,
AsyncEngine
,
AsyncEngineContextProvider
,
Error
,
ManyOut
,
async_trait
,
network
::
Ingress
,
AsyncEngine
,
AsyncEngineContextProvider
,
Error
,
ManyOut
,
ResponseStream
,
SingleIn
,
ResponseStream
,
SingleIn
,
},
},
protocols
::
annotated
::
Annotated
,
protocols
::
annotated
::
Annotated
,
stream
,
DistributedRuntime
,
Result
,
Runtime
,
Worker
,
stream
,
traits
::
events
::
EventPublisher
,
DistributedRuntime
,
Result
,
Runtime
,
Worker
,
};
};
use
rand
::
Rng
;
use
rand
::
Rng
;
use
std
::
sync
::
Arc
;
use
std
::
sync
::
Arc
;
use
tokio
::
time
::{
interval
,
Duration
};
fn
main
()
->
Result
<
()
>
{
fn
main
()
->
Result
<
()
>
{
logging
::
init
();
logging
::
init
();
...
@@ -37,16 +44,16 @@ async fn app(runtime: Runtime) -> Result<()> {
...
@@ -37,16 +44,16 @@ async fn app(runtime: Runtime) -> Result<()> {
backend
(
distributed
)
.await
backend
(
distributed
)
.await
}
}
struct
RequestHandler
{}
struct
Mock
RequestHandler
{}
impl
RequestHandler
{
impl
Mock
RequestHandler
{
fn
new
()
->
Arc
<
Self
>
{
fn
new
()
->
Arc
<
Self
>
{
Arc
::
new
(
Self
{})
Arc
::
new
(
Self
{})
}
}
}
}
#[async_trait]
#[async_trait]
impl
AsyncEngine
<
SingleIn
<
String
>
,
ManyOut
<
Annotated
<
String
>>
,
Error
>
for
RequestHandler
{
impl
AsyncEngine
<
SingleIn
<
String
>
,
ManyOut
<
Annotated
<
String
>>
,
Error
>
for
Mock
RequestHandler
{
async
fn
generate
(
&
self
,
input
:
SingleIn
<
String
>
)
->
Result
<
ManyOut
<
Annotated
<
String
>>>
{
async
fn
generate
(
&
self
,
input
:
SingleIn
<
String
>
)
->
Result
<
ManyOut
<
Annotated
<
String
>>>
{
let
(
data
,
ctx
)
=
input
.into_parts
();
let
(
data
,
ctx
)
=
input
.into_parts
();
...
@@ -61,28 +68,50 @@ impl AsyncEngine<SingleIn<String>, ManyOut<Annotated<String>>, Error> for Reques
...
@@ -61,28 +68,50 @@ impl AsyncEngine<SingleIn<String>, ManyOut<Annotated<String>>, Error> for Reques
}
}
}
}
async
fn
backend
(
runtime
:
DistributedRuntime
)
->
Result
<
()
>
{
/// Spawns a background task that periodically publishes mock KV hit rate events
// attach an ingress to an engine
async
fn
mock_event_publisher
(
namespace
:
Namespace
)
{
let
ingress
=
Ingress
::
for_engine
(
RequestHandler
::
new
())
?
;
// NOTE: These events are just for testing, and shouldn't be interpreted
// in correlation with the stats handler's data:
// 1. The worker ID associated with the events here won't match the
// worker ID of the endpoint's service stats handler.
// 2. These events aren't coming through the KV Router, so the metrics won't
// be reflective of the KV Router's performance.
// 3. The data in these events aren't in sync with the stats handler's
// ForwardPassMetrics data, so they may not correlate well.
let
worker_id
=
rand
::
thread_rng
()
.gen_range
(
1
..=
1000
);
// make the ingress discoverable via a component service
let
mut
interval
=
interval
(
Duration
::
from_secs
(
1
));
// we must first create a service, then we can attach one more more endpoints
loop
{
interval
.tick
()
.await
;
runtime
// Generate random KV hit rate event using a new thread_rng each time
.namespace
(
"dynemo"
)
?
let
isl_blocks
=
rand
::
thread_rng
()
.gen_range
(
0
..=
100
);
.component
(
"backend"
)
?
let
overlap_blocks
=
rand
::
thread_rng
()
.gen_range
(
0
..=
isl_blocks
);
.service_builder
()
.create
()
let
event
=
KVHitRateEvent
{
.await
?
worker_id
,
.endpoint
(
"generate"
)
isl_blocks
,
.endpoint_builder
()
overlap_blocks
,
// Dummy stats handler to demonstrate how to attach a custom stats handler
};
.stats_handler
(|
_
stats
|
{
if
let
Err
(
e
)
=
namespace
.publish
(
KV_HIT_RATE_SUBJECT
,
&
event
)
.await
{
tracing
::
warn!
(
"Failed to publish KV hit rate event: {e}"
);
}
else
{
tracing
::
info!
(
"Published KV hit rate event: worker_id={worker_id}, isl_blocks={isl_blocks}, overlap_blocks={overlap_blocks}, hit_rate={:.2}%"
,
(
overlap_blocks
as
f64
/
isl_blocks
as
f64
)
*
100.0
);
}
}
}
/// Generates mock forward pass metrics for stats handler
fn
mock_stats_handler
(
_
stats
:
Stats
)
->
serde_json
::
Value
{
println!
(
"stats in: {:?}"
,
_
stats
);
println!
(
"stats in: {:?}"
,
_
stats
);
let
request_total_slots
=
100
;
let
request_total_slots
=
100
;
let
request_active_slots
=
rand
::
thread_rng
()
.gen_range
(
0
..
request_total_slots
);
let
request_active_slots
=
rand
::
thread_rng
()
.gen_range
(
0
..
=
request_total_slots
);
let
kv_total_blocks
=
100
;
let
kv_total_blocks
=
100
;
let
kv_active_blocks
=
rand
::
thread_rng
()
.gen_range
(
0
..
kv_total_blocks
);
let
kv_active_blocks
=
rand
::
thread_rng
()
.gen_range
(
0
..
=
kv_total_blocks
);
let
stats
=
ForwardPassMetrics
{
let
stats
=
ForwardPassMetrics
{
request_active_slots
,
request_active_slots
,
request_total_slots
,
request_total_slots
,
...
@@ -91,7 +120,31 @@ async fn backend(runtime: DistributedRuntime) -> Result<()> {
...
@@ -91,7 +120,31 @@ async fn backend(runtime: DistributedRuntime) -> Result<()> {
};
};
println!
(
"stats out: {:?}"
,
stats
);
println!
(
"stats out: {:?}"
,
stats
);
serde_json
::
to_value
(
stats
)
.unwrap
()
serde_json
::
to_value
(
stats
)
.unwrap
()
})
}
async
fn
backend
(
runtime
:
DistributedRuntime
)
->
Result
<
()
>
{
let
namespace
=
runtime
.namespace
(
"dynemo"
)
?
;
// Spawn background task for publishing KV hit rate events
let
namespace_clone
=
namespace
.clone
();
tokio
::
spawn
(
async
move
{
mock_event_publisher
(
namespace_clone
)
.await
;
});
// attach an ingress to an engine
let
ingress
=
Ingress
::
for_engine
(
MockRequestHandler
::
new
())
?
;
// make the ingress discoverable via a component service
// we must first create a service, then we can attach one more more endpoints
namespace
.component
(
"backend"
)
?
.service_builder
()
.create
()
.await
?
.endpoint
(
"generate"
)
.endpoint_builder
()
// Dummy stats handler to demonstrate how to attach a custom stats handler
.stats_handler
(
mock_stats_handler
)
.handler
(
ingress
)
.handler
(
ingress
)
.start
()
.start
()
.await
.await
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment