- 12 Feb, 2026 2 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 11 Feb, 2026 2 commits
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 10 Feb, 2026 5 commits
-
-
Pavani Majety authored
Signed-off-by:Pavani Majety <pmajety@nvidia.com>
-
junuxyz authored
Signed-off-by:junuxyz <216036880+junuxyz@users.noreply.github.com>
-
Krish Gupta authored
Signed-off-by:KrxGu <krishom70@gmail.com>
-
Chen Zhang authored
Signed-off-by:Chen Zhang <zhangch99@outlook.com>
-
Roger Wang authored
Signed-off-by:Roger Wang <hey@rogerw.io>
-
- 07 Feb, 2026 2 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Aaron Hao authored
Signed-off-by:
ahao-anyscale <ahao@anyscale.com> Signed-off-by:
Aaron Hao <ahao@anyscale.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
-
- 06 Feb, 2026 2 commits
-
-
Seiji Eicher authored
Signed-off-by:Seiji Eicher <seiji@anyscale.com>
-
emricksini-h authored
-
- 05 Feb, 2026 4 commits
-
-
Benjamin Chislett authored
Signed-off-by:Benjamin Chislett <bchislett@nvidia.com>
-
liranschour authored
Signed-off-by:
Liran Schour <lirans@il.ibm.com> Signed-off-by:
liranschour <liranschour@users.noreply.github.com> Co-authored-by:
Or Ozeri <or@ozery.com> Co-authored-by:
Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by:
Nicolò Lucchesi <nlucches@redhat.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 04 Feb, 2026 4 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Or Ozeri authored
Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Frank Wang authored
Signed-off-by:frankwang28 <frank.wbb@hotmail.com>
-
- 03 Feb, 2026 2 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
- 02 Feb, 2026 4 commits
-
-
yugong333 authored
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005) Signed-off-by:Yu Gong <yu3.gong@gmail.com>
-
shanjiaz authored
Signed-off-by:shanjiaz <zsjwpianpian@gmail.com>
-
Nicolò Lucchesi authored
[CI][Bugfix] Fix flaky `tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency` (#33555) Signed-off-by:NickLucche <nlucches@redhat.com>
-
Yifan Qiao authored
Signed-off-by:Yifan Qiao <yifanqiao@berkeley.edu>
-
- 31 Jan, 2026 3 commits
-
-
jma99_2333 authored
Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 30 Jan, 2026 3 commits
-
-
Kyle Sayers authored
Signed-off-by:Kyle Sayers <kylesayrs@gmail.com>
-
Frank Wang authored
Signed-off-by:
frankwang28 <frank.wbb@hotmail.com> Signed-off-by:
Frank Wang <41319051+frankwang28@users.noreply.github.com> Co-authored-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com>
-
Patrick von Platen authored
Signed-off-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Nick Hill <nickhill123@gmail.com>
-
- 29 Jan, 2026 2 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
shanjiaz authored
Signed-off-by:shanjiaz <zsjwpianpian@gmail.com>
-
- 28 Jan, 2026 2 commits
-
-
Wentao Ye authored
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618) Signed-off-by:
yewentao256 <zhyanwentao@126.com> Signed-off-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
Nick Hill <nickhill123@gmail.com>
-
Or Ozeri authored
Signed-off-by:
Or Ozeri <oro@il.ibm.com> Co-authored-by:
Kevin H. Luu <khluu000@gmail.com>
-
- 27 Jan, 2026 2 commits
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
omerpaz95 authored
Added queries and hits metrics for the Offloading Connector. Also added timing metrics for store and load operations, which take the average time it takes to load/store, per-token. The metrics are available from Prometheus and from the StatLogger. Signed-off-by:
omerpaz95 <omerpaz95@gmail.com> Co-authored-by:
Omer Paz <Omer.Paz@ibm.com>
-
- 26 Jan, 2026 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-