- 06 Feb, 2026 6 commits
-
-
Luka Govedič authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
ProExpertProg <luka.govedic@gmail.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Xinyu Chen authored
Signed-off-by:Xinyu Chen <xinyu1.chen@intel.com>
-
chengchengpei authored
Signed-off-by:
Chengcheng Pei <chengchengpei@outlook.com> Signed-off-by:
chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by:
chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Mingliang Li authored
Signed-off-by:
limingliang <limingliang@stepfun.com> Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by:
limingliang <limingliang@stepfun.com> Co-authored-by:
DarkLight1337 <tlleungac@connect.ust.hk>
-
Xin Yang authored
-
emricksini-h authored
-
- 05 Feb, 2026 18 commits
-
-
Hashem Hashemi authored
Signed-off-by:Hashem Hashemi <hashem.hashemi@amd.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
bnellnm authored
Signed-off-by:
Bill Nell <bnell@redhat.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
-
Benjamin Chislett authored
Signed-off-by:Benjamin Chislett <bchislett@nvidia.com>
-
Aaron Hao authored
Signed-off-by:
ahao-anyscale <ahao@anyscale.com> Signed-off-by:
Aaron Hao <ahao@anyscale.com> Co-authored-by:
SumanthRH <sumanthrh99@gmail.com>
-
Mario Hong authored
Signed-off-by:mariohong <mariohong128@gmail.com>
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
liranschour authored
Signed-off-by:
Liran Schour <lirans@il.ibm.com> Signed-off-by:
liranschour <liranschour@users.noreply.github.com> Co-authored-by:
Or Ozeri <or@ozery.com> Co-authored-by:
Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by:
Nicolò Lucchesi <nlucches@redhat.com>
-
Andreas Karatzas authored
Signed-off-by:
Andreas Karatzas <akaratza@amd.com> Signed-off-by:
Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com>
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
wang.yuqi <yuqi.wang@daocloud.io>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Andreas Karatzas authored
Signed-off-by:
Andreas Karatzas <akaratza@amd.com> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
wang.yuqi <yuqi.wang@daocloud.io>
-
rasmith authored
[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840) Signed-off-by:Randall Smith <Randall.Smith@amd.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
Luka Govedič authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
ProExpertProg <luka.govedic@gmail.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Ilya Boytsov authored
Signed-off-by:
Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by:
Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
wang.yuqi <yuqi.wang@daocloud.io>
-
- 04 Feb, 2026 13 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Simon Danielsson authored
Signed-off-by:simondanielsson <simon.danielsson99@hotmail.com>
-
kourosh hakhamaneshi authored
Signed-off-by:Kourosh Hakhamaneshi <kourosh@anyscale.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Or Ozeri authored
Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Frank Wang authored
Signed-off-by:frankwang28 <frank.wbb@hotmail.com>
-
R3hankhan authored
Signed-off-by:Rehan Khan <Rehan.Khan7@ibm.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
- 03 Feb, 2026 3 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Patrick von Platen authored
Signed-off-by:Patrick von Platen <patrick.v.platen@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-