- 05 Feb, 2026 4 commits
-
-
Chauncey authored
Signed-off-by:chaunceyjiang <chaunceyjiang@gmail.com>
-
zhanqiuhu authored
Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Luka Govedič authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
ProExpertProg <luka.govedic@gmail.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Ilya Boytsov authored
Signed-off-by:
Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by:
Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
wang.yuqi <yuqi.wang@daocloud.io>
-
- 04 Feb, 2026 36 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Sage Moore authored
Change the type signature of MixtureOfExperts.expert_weights to MutableSequence[Sequence[Tensor]] (#33573) Signed-off-by:
Sage Moore <sagmoore@redhat.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Muhammad Hashmi authored
Signed-off-by:
Muhammad Hashmi <mhashmi@berkeley.edu> Signed-off-by:
NickLucche <nlucches@redhat.com> Co-authored-by:
NickLucche <nlucches@redhat.com>
-
Simon Danielsson authored
Signed-off-by:simondanielsson <simon.danielsson99@hotmail.com>
-
Taeksang Kim authored
Signed-off-by:Taeksang Kim <ts.kim@hyperaccel.ai>
-
kourosh hakhamaneshi authored
Signed-off-by:Kourosh Hakhamaneshi <kourosh@anyscale.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
jiangkuaixue123 authored
Signed-off-by:jiangkuaixue123 <jiangxiaozhou111@163.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Wentao Ye authored
Signed-off-by:
yewentao256 <zhyanwentao@126.com> Signed-off-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Micah Williamson authored
Signed-off-by:Micah Williamson <micah.williamson@amd.com>
-
Cyrus Leung authored
Signed-off-by:
Zachary Aristei <zaristei@nvidia.com> Co-authored-by:
zaristei2 <zaristei2@gmail.com> Co-authored-by:
Zachary Aristei <zaristei@nvidia.com>
-
Chauncey authored
Signed-off-by:chaunceyjiang <chaunceyjiang@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Yueqian Lin authored
Signed-off-by:
linyueqian <linyueqian@outlook.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
Vadim Gimpelson authored
Signed-off-by:Vadim Gimpelson <vadim.gimpelson@gmail.com>
-
Or Ozeri authored
Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
Zhengxu Chen authored
Signed-off-by:zhxchen17 <zhxchen17@fb.com>
-
Zhengxu Chen authored
Signed-off-by:zhxchen17 <zhxchen17@fb.com>
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
Augusto Yao authored
Signed-off-by:augusto.yjh <augusto.yjh@antgroup.com>
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Matt authored
Signed-off-by:Matthew Wong <Matthew.Wong2@amd.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Frank Wang authored
Signed-off-by:frankwang28 <frank.wbb@hotmail.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
Huy Do authored
Signed-off-by:Huy Do <huydhn@gmail.com>
-
Shanshan Shen authored
Signed-off-by:shen-shanshan <467638484@qq.com>
-
R3hankhan authored
Signed-off-by:Rehan Khan <Rehan.Khan7@ibm.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-