- 01 Dec, 2025 13 commits
-
-
shivampr authored
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by:Shivam <shivamprasad91@gmail.com>
-
knlnguyen1802 authored
Signed-off-by:
knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by:
Chenguang Zheng <645327136@qq.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
sangbumlikeagod authored
Signed-off-by:
sangbumlikeagod <oironese@naver.com> Signed-off-by:
sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
-
FredericOdermatt authored
Signed-off-by:Frederic Odermatt <frederic.odermatt@44ai.ch>
-
Shengqi Chen authored
Signed-off-by:Shengqi Chen <harry-chen@outlook.com>
-
Isotr0py authored
Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
baonudesifeizhai <baonudesifeizhai@gmail.com> Co-authored-by:
baonudesifeizhai <baonudesifeizhai@gmail.com>
-
Fanli Lin authored
Signed-off-by:Fanli Lin <fanli.lin@intel.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Mickaël Seznec authored
Signed-off-by:
Mickael Seznec <mickael@mistral.ai> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
daniel-salib authored
Signed-off-by:
Daniel Salib <danielsalib@meta.com> Co-authored-by:
Chauncey <chaunceyjiang@gmail.com>
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
Yifei Zhang authored
Signed-off-by:Yifei Zhang <yifei.zhang1992@outlook.com>
-
Shu Wang authored
Signed-off-by:
Shu Wang <shuw@nvidia.com> Signed-off-by:
Shu Wang. <shuw@nvidia.com> Signed-off-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com>
-
- 30 Nov, 2025 13 commits
-
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Xingyu Liu authored
Signed-off-by:
Xingyu Liu <charlotteliu12x@gmail.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Omer Ullman Argov authored
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Pleaplusone authored
Signed-off-by:ganyi <ygan@amd.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
朝 authored
Signed-off-by:BowTen <bowten@qq.com>
-
Vensen authored
Signed-off-by:
vensen <vensenmu@gmail.com> Co-authored-by:
TJian <tunjian.tan@embeddedllm.com>
-
Huamin Li authored
Signed-off-by:Huamin Li <3ericli@gmail.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Xin Yang authored
Signed-off-by:
Xin Yang <xyangx@amazon.com> Signed-off-by:
Xin Yang <105740670+xyang16@users.noreply.github.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com>
-
- 29 Nov, 2025 14 commits
-
-
Jinzhen Lin authored
Signed-off-by:
Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by:
Michael Goin <mgoin64@gmail.com> Signed-off-by:
Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Michael Goin <mgoin@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Didier Durand authored
Signed-off-by:Didier Durand <durand.didier@gmail.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Jee Jee Li authored
Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Tsukasa OI authored
Signed-off-by:Tsukasa OI <floss_llm@irq.a4lg.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Mert Unsal authored
Signed-off-by:
mertunsall <mertunsal1905@gmail.com> Co-authored-by:
Roger Wang <hey@rogerw.io>
-