- 04 Feb, 2026 18 commits
-
-
Zhengxu Chen authored
Signed-off-by:zhxchen17 <zhxchen17@fb.com>
-
Zhengxu Chen authored
Signed-off-by:zhxchen17 <zhxchen17@fb.com>
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
Augusto Yao authored
Signed-off-by:augusto.yjh <augusto.yjh@antgroup.com>
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
Matt authored
Signed-off-by:Matthew Wong <Matthew.Wong2@amd.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Frank Wang authored
Signed-off-by:frankwang28 <frank.wbb@hotmail.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
Huy Do authored
Signed-off-by:Huy Do <huydhn@gmail.com>
-
Shanshan Shen authored
Signed-off-by:shen-shanshan <467638484@qq.com>
-
R3hankhan authored
Signed-off-by:Rehan Khan <Rehan.Khan7@ibm.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
- 03 Feb, 2026 22 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Michael Goin authored
[Bugfix] Disable TRTLLM FP8 MoE if router_logits_dtype==float32 and routing_method!=DeepSeekV3 (#33613) Signed-off-by:mgoin <mgoin64@gmail.com>
-
Patrick von Platen authored
Signed-off-by:Patrick von Platen <patrick.v.platen@gmail.com>
-
Vadim Gimpelson authored
Signed-off-by:Vadim Gimpelson <vadim.gimpelson@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
dtc authored
Signed-off-by:
Tianchen Ding <dtcccc@linux.alibaba.com> Co-authored-by:
Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Patrick von Platen authored
Signed-off-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Shanshan Shen authored
Signed-off-by:shen-shanshan <467638484@qq.com>
-
wang.yuqi authored
[Bugfix] Do not add extra \n for image-only cases when constructing multimodal text prompts. (#33647) Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
shaharmor98 authored
-
Kuntai Du authored
[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer (#33377) Signed-off-by:KuntaiDu <kuntai@uchicago.edu>
-
Krish Gupta authored
Signed-off-by:KrxGu <krishom70@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
zxy authored
Signed-off-by:
zxy <zhou0493@e.ntu.edu.sg> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Song Zhixin authored
Signed-off-by:
jesse <szxfml@gmail.com> Signed-off-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-