- 06 Feb, 2026 4 commits
-
-
Xinyu Chen authored
Signed-off-by:Xinyu Chen <xinyu1.chen@intel.com>
-
Gassan Salama authored
Signed-off-by:Gassan <gassan.salama@arm.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
emricksini-h authored
-
- 05 Feb, 2026 9 commits
-
-
Benjamin Chislett authored
Signed-off-by:Benjamin Chislett <bchislett@nvidia.com>
-
Aaron Hao authored
Signed-off-by:
ahao-anyscale <ahao@anyscale.com> Signed-off-by:
Aaron Hao <ahao@anyscale.com> Co-authored-by:
SumanthRH <sumanthrh99@gmail.com>
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
wang.yuqi <yuqi.wang@daocloud.io>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Chauncey authored
Signed-off-by:chaunceyjiang <chaunceyjiang@gmail.com>
-
Li, Jiang authored
Signed-off-by:jiang1.li <jiang1.li@intel.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Luka Govedič authored
-
zhanqiuhu authored
Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
- 04 Feb, 2026 10 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Taeksang Kim authored
Signed-off-by:Taeksang Kim <ts.kim@hyperaccel.ai>
-
kourosh hakhamaneshi authored
Signed-off-by:Kourosh Hakhamaneshi <kourosh@anyscale.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
jiangkuaixue123 authored
Signed-off-by:jiangkuaixue123 <jiangxiaozhou111@163.com>
-
Wentao Ye authored
Signed-off-by:
yewentao256 <zhyanwentao@126.com> Signed-off-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Or Ozeri authored
Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Frank Wang authored
Signed-off-by:frankwang28 <frank.wbb@hotmail.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
- 03 Feb, 2026 6 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Kuntai Du authored
[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer (#33377) Signed-off-by:KuntaiDu <kuntai@uchicago.edu>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
- 02 Feb, 2026 6 commits
-
-
yugong333 authored
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005) Signed-off-by:Yu Gong <yu3.gong@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Rabi Mishra authored
Signed-off-by:rabi <ramishra@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Yifan Qiao authored
Signed-off-by:Yifan Qiao <yifanqiao@berkeley.edu>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 01 Feb, 2026 5 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Komal Kumar Teru authored
Signed-off-by:kkt-cohere <komal@cohere.com>
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by:
YunzhuLu <lucia.yunzhu@gmail.com>
-
Cyrus Leung authored
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-