- 04 Feb, 2026 9 commits
-
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Or Ozeri authored
Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Frank Wang authored
Signed-off-by:frankwang28 <frank.wbb@hotmail.com>
-
R3hankhan authored
Signed-off-by:Rehan Khan <Rehan.Khan7@ibm.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
- 03 Feb, 2026 11 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Patrick von Platen authored
Signed-off-by:Patrick von Platen <patrick.v.platen@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
shaharmor98 authored
-
zxy authored
Signed-off-by:
zxy <zhou0493@e.ntu.edu.sg> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
杨朱 · Kiki authored
Signed-off-by:
carlory <baofa.fan@daocloud.io> Co-authored-by:
Claude Opus 4.5 <noreply@anthropic.com>
-
Daniel Mescheder authored
Signed-off-by:
Daniel Mescheder <dmesch@amazon.com> Co-authored-by:
Daniel Mescheder <dmesch@amazon.com>
-
- 02 Feb, 2026 16 commits
-
-
Patrick von Platen authored
Signed-off-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Vasiliy Kuznetsov authored
Signed-off-by:vasiliy <vasiliy@fb.com>
-
yugong333 authored
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005) Signed-off-by:Yu Gong <yu3.gong@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
shanjiaz authored
Signed-off-by:shanjiaz <zsjwpianpian@gmail.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
danielafrimi authored
Signed-off-by:dafrimi <dafrimi@nvidia.com>
-
Nicolò Lucchesi authored
[CI][Bugfix] Fix flaky `tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency` (#33555) Signed-off-by:NickLucche <nlucches@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
RED authored
Signed-off-by:
liuli <ll407707@alibaba-inc.com> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
liuli <ll407707@alibaba-inc.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn>
-
jack authored
Signed-off-by:
QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by:
QwertyJack <7554089+QwertyJack@users.noreply.github.com>
-
Robert Shaw authored
Signed-off-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com>
-
csy0225 authored
Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com> Co-authored-by:
i-zhangmingming <i-zhangmingming@stepfun.com> Co-authored-by:
xiewuxun <xiewuxun@stepfun.com> Co-authored-by:
zetaohong <i-hongzetao@stepfun.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com>
-
Yifan Qiao authored
Signed-off-by:Yifan Qiao <yifanqiao@berkeley.edu>
-
Runkai Tao authored
Signed-off-by:Runkai Tao <rt572@physics.rutgers.edu>
-
- 31 Jan, 2026 4 commits
-
-
Roy Wang authored
Signed-off-by:esmeetu <jasonailu87@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
jma99_2333 authored
Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 authored
Signed-off-by:Hollow Man <hollowman@opensuse.org>
-