- 05 Dec, 2025 15 commits
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Ilya Markov authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
ilmarkov <markovilya197@gmail.com> Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
ProExpertProg <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com>
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Nick Hill authored
Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by:Nick Hill <nhill@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:
NickLucche <nlucches@redhat.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Signed-off-by:
Andrew Xia <axia@meta.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
strinczer authored
Signed-off-by:
Shai Trinczer <strinczer@icloud.com> Signed-off-by:
strinczer <strinczer@icloud.com>
-
Alec S authored
Signed-off-by:
Alec Solder <alecs@fb.com> Signed-off-by:
Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by:
Alec Solder <alecs@fb.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
22quinn <33176974+22quinn@users.noreply.github.com>
-
Yanan Cao authored
Signed-off-by:Yanan Cao <gmagogsfm@gmail.com>
-
rasmith authored
Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
Chukwuma Nwaugha authored
Signed-off-by: nwaughac@gmail.com
-
Charlie Fu authored
Signed-off-by:charlifu <charlifu@amd.com>
-
Hubert de La Jonquiere authored
Signed-off-by:hdlj-h <hubert@hcompany.ai>
-
- 04 Dec, 2025 20 commits
-
-
Laith Sakka authored
Signed-off-by:Laith Sakka <lsakka@meta.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Mercykid-bash authored
Signed-off-by:
Che Ruan <cr623@ic.ac.uk> Signed-off-by:
mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Signed-off-by:
Mercykid-bash <ruanche0218@gmail.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Che Ruan <cr623@ic.ac.uk> Co-authored-by:
mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Kuntai Du authored
[KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer (#29705) Signed-off-by:KuntaiDu <kuntai@uchicago.edu>
-
Qiu authored
Signed-off-by:QiuChunshuo <qiuchunshuo@huawei.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Doug Smith authored
Signed-off-by:dougbtv <dosmith@redhat.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <noooop@126.com> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Chauncey authored
Signed-off-by:
chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by:
Chauncey <chaunceyjiang@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
Noa Neria authored
Signed-off-by:Noa Neria <noa@run.ai>
-
rasmith authored
[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995) Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
Arpit Khandelwal authored
Signed-off-by:
arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Micah Williamson authored
Signed-off-by:Micah Williamson <micah.williamson@amd.com>
-
Charlie Fu authored
Signed-off-by:
charlifu <charlifu@amd.com> Co-authored-by:
Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Benjamin Bartels authored
[Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk (#29971) Signed-off-by:bbartels <benjamin@bartels.dev>
-
- 03 Dec, 2025 5 commits
-
-
Shengqi Chen authored
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930) Signed-off-by:Shengqi Chen <harry-chen@outlook.com>
-
Elizabeth Thomas authored
Signed-off-by:
Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Signed-off-by:
Jane Xu <janeyx@meta.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Signed-off-by:
Johnny Yang <johnnyyang@google.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
bruceszchen <bruceszchen@tencent.com> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Co-authored-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Johnny Yang <24908445+jcyang43@users.noreply.github.com>
-
bnellnm authored
Signed-off-by:
Bill Nell <bnell@redhat.com> Signed-off-by:
bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
Varun Sundar Rabindranath authored
Signed-off-by:
Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by:
Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
avigny authored
Signed-off-by:
avigny <47987522+avigny@users.noreply.github.com> Signed-off-by:
Chauncey <chaunceyjiang@gmail.com> Signed-off-by:
chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by:
Jeff Cook <jeff@jeffcook.io> Co-authored-by:
sfbemerk <benjaminmerkel@mail.de> Co-authored-by:
Chauncey <chaunceyjiang@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-