Unverified Commit d4acf518 authored by Jae-Won Chung's avatar Jae-Won Chung Committed by GitHub
Browse files

[Metrics] Fix KV cache usage percent metric multiproc (#28792)



The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning

```
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035
...
```

The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`.
Signed-off-by: default avatarJae-Won Chung <jwnchung@umich.edu>
parent ab01cd14
...@@ -494,6 +494,7 @@ class PrometheusStatLogger(AggregateStatLoggerBase): ...@@ -494,6 +494,7 @@ class PrometheusStatLogger(AggregateStatLoggerBase):
gauge_kv_cache_usage = self._gauge_cls( gauge_kv_cache_usage = self._gauge_cls(
name="vllm:kv_cache_usage_perc", name="vllm:kv_cache_usage_perc",
documentation="KV-cache usage. 1 means 100 percent usage.", documentation="KV-cache usage. 1 means 100 percent usage.",
multiprocess_mode="mostrecent",
labelnames=labelnames, labelnames=labelnames,
) )
self.gauge_kv_cache_usage = make_per_engine( self.gauge_kv_cache_usage = make_per_engine(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment