- 14 Apr, 2026 1 commit
-
-
Mark McLoughlin authored
[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats (#37460) Related to `Counters can only be incremented by non-negative amounts` error with the `vllm:prompt_tokens_by_source_total` metric. Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Or Ozeri <or@ozery.com>
-
- 12 Apr, 2026 1 commit
-
-
Martin Hickey authored
Signed-off-by:
Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by:
Or Ozeri <or@ozery.com>
-
- 09 Apr, 2026 1 commit
-
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
- 08 Apr, 2026 1 commit
-
-
triangleXIV authored
[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service (#39102) Signed-off-by:
triangle14 <y1019026570@gmail.com> Signed-off-by:
mgoin <mgoin64@gmail.com> Co-authored-by:
mgoin <mgoin64@gmail.com>
-
- 28 Mar, 2026 1 commit
-
-
yzong-rh authored
Signed-off-by:Yifan <yzong@redhat.com>
-
- 13 Mar, 2026 2 commits
-
-
Itay Alroy authored
Signed-off-by:
Itay Alroy <ialroy@nvidia.com> Co-authored-by:
Yongji Wu <wuyongji317@gmail.com> Co-authored-by:
Ron Tourgeman <rtourgeman@nvidia.com>
-
Sage authored
Signed-off-by:Sage Ahrac <sagiahrak@gmail.com>
-
- 10 Mar, 2026 1 commit
-
-
Wentao Ye authored
[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159) Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 07 Mar, 2026 1 commit
-
-
lif authored
Signed-off-by:majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath
-
- 05 Mar, 2026 1 commit
-
-
Ning Xie authored
Signed-off-by:Andy Xie <andy.xning@gmail.com>
-
- 25 Feb, 2026 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 17 Feb, 2026 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 13 Feb, 2026 2 commits
-
-
Aaron Hao authored
Signed-off-by:
ahao-anyscale <ahao@anyscale.com> Signed-off-by:
Aaron Hao <ahao@anyscale.com> Signed-off-by:
hao-aaron <ahao@anyscale.com> Signed-off-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Nick Hill <nickhill123@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 11 Feb, 2026 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 07 Feb, 2026 1 commit
-
-
Aaron Hao authored
Signed-off-by:
ahao-anyscale <ahao@anyscale.com> Signed-off-by:
Aaron Hao <ahao@anyscale.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
-
- 31 Jan, 2026 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 26 Jan, 2026 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 22 Jan, 2026 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 15 Jan, 2026 3 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
dtc authored
Signed-off-by:Tianchen Ding <dtcccc@linux.alibaba.com>
-
Chauncey authored
Signed-off-by:chaunceyjiang <chaunceyjiang@gmail.com>
-
- 13 Jan, 2026 2 commits
-
-
Chauncey authored
Signed-off-by:chaunceyjiang <chaunceyjiang@gmail.com>
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
- 08 Jan, 2026 1 commit
-
-
Ryan Rock authored
Signed-off-by:Ryan Rock <ryan.rock@amd.com>
-
- 06 Jan, 2026 1 commit
-
-
John Calderon authored
Signed-off-by:
John Calderon <jcalderon@nvidia.com> Co-authored-by:
Benjamin Chislett <bchislett@nvidia.com>
-
- 02 Jan, 2026 1 commit
-
-
Nick Hill authored
Signed-off-by:
Nick Hill <nhill@redhat.com> Signed-off-by:
njhill <nickhill123@gmail.com>
-
- 30 Dec, 2025 1 commit
-
-
Sage authored
[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577) Signed-off-by:
Sage Ahrac <sagiahrak@gmail.com> Co-authored-by:
Sage <80211083+sagiahrac@users.noreply.github.com>
-
- 26 Dec, 2025 1 commit
-
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
- 23 Dec, 2025 2 commits
-
-
Mark McLoughlin authored
Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Divakar Verma authored
Signed-off-by:Divakar Verma <divakar.verma@amd.com>
-
- 19 Dec, 2025 2 commits
-
-
Seiji Eicher authored
Signed-off-by:Seiji Eicher <seiji@anyscale.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 18 Dec, 2025 1 commit
-
-
inkcherry authored
Signed-off-by:inkcherry <mingzhi.liu@amd.com>
-
- 10 Dec, 2025 1 commit
-
-
shivampr authored
Signed-off-by:
Shivam <shivamprasad91@gmail.com> Signed-off-by:
shivampr <shivampr.dev@gmail.com> Co-authored-by:
Chen Zhang <zhangch99@outlook.com>
-
- 09 Dec, 2025 1 commit
-
-
Or Ozeri authored
Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
- 07 Dec, 2025 2 commits
-
-
Cyrus Leung authored
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 05 Dec, 2025 1 commit
-
-
Nick Hill authored
Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 03 Dec, 2025 1 commit
-
-
Lumis Chen authored
Signed-off-by:
LuminolT <lumischen01@gmail.com> Signed-off-by:
Lumis Chen <lumischen01@gmail.com> Co-authored-by:
Russell Bryant <rbryant@redhat.com>
-