- 05 Feb, 2026 13 commits
-
-
Aaron Hao authored
Signed-off-by:
ahao-anyscale <ahao@anyscale.com> Signed-off-by:
Aaron Hao <ahao@anyscale.com> Co-authored-by:
SumanthRH <sumanthrh99@gmail.com>
-
Mario Hong authored
Signed-off-by:mariohong <mariohong128@gmail.com>
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
liranschour authored
Signed-off-by:
Liran Schour <lirans@il.ibm.com> Signed-off-by:
liranschour <liranschour@users.noreply.github.com> Co-authored-by:
Or Ozeri <or@ozery.com> Co-authored-by:
Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by:
Nicolò Lucchesi <nlucches@redhat.com>
-
Andreas Karatzas authored
Signed-off-by:
Andreas Karatzas <akaratza@amd.com> Signed-off-by:
Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com>
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
wang.yuqi <yuqi.wang@daocloud.io>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Andreas Karatzas authored
Signed-off-by:
Andreas Karatzas <akaratza@amd.com> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
wang.yuqi <yuqi.wang@daocloud.io>
-
rasmith authored
[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840) Signed-off-by:Randall Smith <Randall.Smith@amd.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
Luka Govedič authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
ProExpertProg <luka.govedic@gmail.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Ilya Boytsov authored
Signed-off-by:
Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by:
Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
wang.yuqi <yuqi.wang@daocloud.io>
-
- 04 Feb, 2026 13 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Simon Danielsson authored
Signed-off-by:simondanielsson <simon.danielsson99@hotmail.com>
-
kourosh hakhamaneshi authored
Signed-off-by:Kourosh Hakhamaneshi <kourosh@anyscale.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Or Ozeri authored
Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Frank Wang authored
Signed-off-by:frankwang28 <frank.wbb@hotmail.com>
-
R3hankhan authored
Signed-off-by:Rehan Khan <Rehan.Khan7@ibm.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
- 03 Feb, 2026 11 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Patrick von Platen authored
Signed-off-by:Patrick von Platen <patrick.v.platen@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
shaharmor98 authored
-
zxy authored
Signed-off-by:
zxy <zhou0493@e.ntu.edu.sg> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
杨朱 · Kiki authored
Signed-off-by:
carlory <baofa.fan@daocloud.io> Co-authored-by:
Claude Opus 4.5 <noreply@anthropic.com>
-
Daniel Mescheder authored
Signed-off-by:
Daniel Mescheder <dmesch@amazon.com> Co-authored-by:
Daniel Mescheder <dmesch@amazon.com>
-
- 02 Feb, 2026 3 commits
-
-
Patrick von Platen authored
Signed-off-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Vasiliy Kuznetsov authored
Signed-off-by:vasiliy <vasiliy@fb.com>
-
yugong333 authored
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005) Signed-off-by:Yu Gong <yu3.gong@gmail.com>
-