- 09 Dec, 2025 5 commits
-
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Hubert de La Jonquiere authored
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056)
-
Fanli Lin authored
Signed-off-by:Lin, Fanli <fanli.lin@intel.com>
-
Or Ozeri authored
Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
Ming Yang authored
Signed-off-by:Ming Yang <minos.future@gmail.com>
-
- 08 Dec, 2025 3 commits
-
-
Simon Mo authored
Signed-off-by:
simon-mo <simon.mo@hey.com> Signed-off-by:
Michael Goin <mgoin64@gmail.com> Signed-off-by:
Simon Mo <simon.mo@hey.com> Co-authored-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Russell Bryant <rbryant@redhat.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
wang.yuqi authored
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686) Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
Zhiyu authored
Signed-off-by:Zhiyu Cheng <zhiyuc@nvidia.com>
-
- 07 Dec, 2025 4 commits
-
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Cyrus Leung authored
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
jeremyteboul authored
Signed-off-by:
Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by:
Jeremy Teboul <jeremyteboul@fb.com>
-
- 06 Dec, 2025 2 commits
-
-
Viacheslav authored
Signed-off-by:Viacheslav Barinov <viacheslav.teh@gmail.com>
-
redwrasse authored
Signed-off-by:redwrasse <mail@redwrasse.io>
-
- 05 Dec, 2025 4 commits
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Yanan Cao authored
Signed-off-by:Yanan Cao <gmagogsfm@gmail.com>
-
Tiger Xu / Zhonghu Xu authored
Signed-off-by:Zhonghu Xu <xuzhonghu@huawei.com>
-
Hubert de La Jonquiere authored
Signed-off-by:hdlj-h <hubert@hcompany.ai>
-
- 04 Dec, 2025 8 commits
-
-
TimWang authored
Signed-off-by:Tim <tim.wang03@sap.com>
-
Tao Yun authored
Signed-off-by:
taoyun <1069423820@qq.com> Signed-off-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Shengqi Chen authored
Signed-off-by:Shengqi Chen <harry-chen@outlook.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <noooop@126.com> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
dtc authored
Signed-off-by:
Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by:
dtc <dtcccc@linux.alibaba.com> Co-authored-by:
Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
-
CYJiang authored
In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API. Signed-off-by:CYJiang <86391540+googs1025@users.noreply.github.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 03 Dec, 2025 7 commits
-
-
bnellnm authored
Signed-off-by:
Bill Nell <bnell@redhat.com> Signed-off-by:
bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
Lumis Chen authored
Signed-off-by:
LuminolT <lumischen01@gmail.com> Signed-off-by:
Lumis Chen <lumischen01@gmail.com> Co-authored-by:
Russell Bryant <rbryant@redhat.com>
-
ioana ghiban authored
Signed-off-by:Ioana Ghiban <ioana.ghiban@arm.com>
-
ioana ghiban authored
Signed-off-by:Ioana Ghiban <ioana.ghiban@arm.com>
-
Amr Mahdi authored
Signed-off-by:Amr Mahdi <amrmahdi@meta.com>
-
Fadi Arafeh authored
Signed-off-by:Fadi Arafeh <fadi.arafeh@arm.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 02 Dec, 2025 4 commits
-
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Julien Denize authored
Signed-off-by:
Julien Denize <julien.denize@mistral.ai> Signed-off-by:
Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by:
Mickael Seznec <mickael@mistral.ai> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Mickael Seznec <mickael@mistral.ai>
-
Louie Tsai authored
Signed-off-by:
Tsai, Louie <louie.tsai@intel.com> Signed-off-by:
Louie Tsai <louie.tsai@intel.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Li, Jiang <bigpyj64@gmail.com>
-
Shengqi Chen authored
Signed-off-by:Shengqi Chen <harry-chen@outlook.com>
-
- 01 Dec, 2025 3 commits
-
-
Kevin H. Luu authored
-
Finbarr Timbers authored
Signed-off-by:Finbarr Timbers <finbarrtimbers@gmail.com>
-
shivampr authored
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by:Shivam <shivamprasad91@gmail.com>
-