"lib/bench/kv_router/active_sequences_shared.rs" did not exist on "930721c8c93a6473447b846d6cc5833caf4301f6"
- 27 Feb, 2026 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 13 Feb, 2026 2 commits
-
-
Wentao Ye authored
Signed-off-by:
yewentao256 <zhyanwentao@126.com> Signed-off-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Jaewon authored
Signed-off-by:
Jaewon Lee <jaewon@meta.com> Co-authored-by:
Lu Fang <30275821+houseroad@users.noreply.github.com>
-
- 11 Feb, 2026 1 commit
-
-
bnellnm authored
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344) Signed-off-by:Bill Nell <bnell@redhat.com>
-
- 10 Feb, 2026 2 commits
-
-
Ilya Markov authored
Signed-off-by:ilmarkov <markovilya197@gmail.com>
-
J Seppänen authored
Signed-off-by:
Jarno Seppänen <jseppanen@nvidia.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
- 06 Feb, 2026 2 commits
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
emricksini-h authored
-
- 05 Feb, 2026 1 commit
-
-
Aaron Hao authored
Signed-off-by:
ahao-anyscale <ahao@anyscale.com> Signed-off-by:
Aaron Hao <ahao@anyscale.com> Co-authored-by:
SumanthRH <sumanthrh99@gmail.com>
-
- 04 Feb, 2026 1 commit
-
-
kourosh hakhamaneshi authored
Signed-off-by:Kourosh Hakhamaneshi <kourosh@anyscale.com>
-
- 31 Jan, 2026 1 commit
-
-
jma99_2333 authored
Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
- 30 Jan, 2026 1 commit
-
-
Kyle Sayers authored
Signed-off-by:Kyle Sayers <kylesayrs@gmail.com>
-
- 29 Jan, 2026 1 commit
-
-
Chendi.Xue authored
Signed-off-by:Chendi Xue <chendi.xue@intel.com>
-
- 28 Jan, 2026 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 24 Jan, 2026 1 commit
-
-
Reagan Lee authored
Signed-off-by:
Reagan <reaganjlee@gmail.com> Signed-off-by:
Reagan Lee <96998476+reaganjlee@users.noreply.github.com> Co-authored-by:
Hiroken. <105287758+HirokenOvo@users.noreply.github.com>
-
- 23 Jan, 2026 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 19 Jan, 2026 1 commit
-
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 14 Jan, 2026 1 commit
-
-
Shanshan Shen authored
Signed-off-by:shen-shanshan <467638484@qq.com>
-
- 09 Jan, 2026 1 commit
-
-
Max Hu authored
Signed-off-by:
Max Hu <maxhu@nvidia.com> Signed-off-by:
Max Hu <hyoung2991@gmail.com> Co-authored-by:
Max Hu <maxhu@nvidia.com>
-
- 08 Jan, 2026 2 commits
-
-
Lucas Wilkinson authored
[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747) Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
- 07 Jan, 2026 2 commits
-
-
Ning Xie authored
Signed-off-by:Andy Xie <andy.xning@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 06 Jan, 2026 1 commit
-
-
Ning Xie authored
Signed-off-by:Andy Xie <andy.xning@gmail.com>
-
- 05 Jan, 2026 2 commits
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
wangxiyuan authored
Signed-off-by:wangxiyuan <wangxiyuan1007@gmail.com>
-
- 02 Jan, 2026 1 commit
-
-
Nick Hill authored
Signed-off-by:
Nick Hill <nhill@redhat.com> Signed-off-by:
njhill <nickhill123@gmail.com>
-
- 31 Dec, 2025 1 commit
-
-
Nick Hill authored
Signed-off-by:njhill <nickhill123@gmail.com>
-
- 27 Dec, 2025 1 commit
-
-
Yifan Qiao authored
Signed-off-by:
Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by:
KuntaiDu <kuntai@uchicago.edu>
-
- 24 Dec, 2025 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
- 22 Dec, 2025 1 commit
-
-
Boyuan Feng authored
Signed-off-by:Boyuan Feng <boyuan@meta.com>
-
- 17 Dec, 2025 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 15 Dec, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
- 12 Dec, 2025 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
- 11 Dec, 2025 1 commit
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 09 Dec, 2025 2 commits
-
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Wentao Ye authored
[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897) Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 05 Dec, 2025 2 commits
-
-
Ilya Markov authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
ilmarkov <markovilya197@gmail.com> Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
ProExpertProg <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com>
-
Nick Hill authored
Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 03 Dec, 2025 1 commit
-
-
Yong Hoon Shin authored
Signed-off-by:Yong Hoon Shin <yhshin@meta.com>
-