- 04 Feb, 2026 6 commits
-
-
jiangkuaixue123 authored
Signed-off-by:jiangkuaixue123 <jiangxiaozhou111@163.com>
-
Wentao Ye authored
Signed-off-by:
yewentao256 <zhyanwentao@126.com> Signed-off-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Or Ozeri authored
Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Frank Wang authored
Signed-off-by:frankwang28 <frank.wbb@hotmail.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
- 03 Feb, 2026 6 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Kuntai Du authored
[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer (#33377) Signed-off-by:KuntaiDu <kuntai@uchicago.edu>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
- 02 Feb, 2026 6 commits
-
-
yugong333 authored
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005) Signed-off-by:Yu Gong <yu3.gong@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Rabi Mishra authored
Signed-off-by:rabi <ramishra@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Yifan Qiao authored
Signed-off-by:Yifan Qiao <yifanqiao@berkeley.edu>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 01 Feb, 2026 6 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Komal Kumar Teru authored
Signed-off-by:kkt-cohere <komal@cohere.com>
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by:
YunzhuLu <lucia.yunzhu@gmail.com>
-
Cyrus Leung authored
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 31 Jan, 2026 15 commits
-
-
smashyalts authored
Signed-off-by:smashyalts <smashyalts@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
YunzhuLu authored
Signed-off-by:
YunzhuLu <lucia.yunzhu@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
jma99_2333 authored
Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 authored
Signed-off-by:Hollow Man <hollowman@opensuse.org>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
caozuoba authored
Signed-off-by:hdj <1293066020@qq.com>
-
jennyyyyzhen authored
Signed-off-by:jennyyyyzhen <yzhen@hmc.edu>
-
Francesco Fusco authored
Signed-off-by:Francesco Fusco <ffu@zurich.ibm.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Alberto Ferrer authored
Signed-off-by:Alberto Ferrer <albertof@barrahome.org>
-
- 30 Jan, 2026 1 commit
-
-
Kyle Sayers authored
Signed-off-by:Kyle Sayers <kylesayrs@gmail.com>
-