- 05 Dec, 2025 21 commits
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Nick Hill authored
Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by:Nick Hill <nhill@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:
NickLucche <nlucches@redhat.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Angela Yi authored
Signed-off-by:angelayi <yiangela7@gmail.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Signed-off-by:
Andrew Xia <axia@meta.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Alec S authored
Signed-off-by:
Alec Solder <alecs@fb.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Alec Solder <alecs@fb.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Yi Liu authored
Signed-off-by:yiliu30 <yi4.liu@intel.com>
-
Max Hu authored
Signed-off-by:
Max Hu <hyoung2991@gmail.com> Signed-off-by:
Max Hu <maxhu@nvidia.com> Co-authored-by:
Max Hu <maxhu@nvidia.com>
-
Zhiwei authored
Signed-off-by:ZhiweiYan-96 <zhiwei.yan@amd.com>
-
strinczer authored
Signed-off-by:
Shai Trinczer <strinczer@icloud.com> Signed-off-by:
strinczer <strinczer@icloud.com>
-
Ning Xie authored
Signed-off-by:Andy Xie <andy.xning@gmail.com>
-
Ming Yang authored
Signed-off-by:Ming Yang <minos.future@gmail.com>
-
Alec S authored
Signed-off-by:
Alec Solder <alecs@fb.com> Signed-off-by:
Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by:
Alec Solder <alecs@fb.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
22quinn <33176974+22quinn@users.noreply.github.com>
-
Yanan Cao authored
Signed-off-by:Yanan Cao <gmagogsfm@gmail.com>
-
amitz-nv authored
[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978) Signed-off-by:amitz-nv <203509407+amitz-nv@users.noreply.github.com>
-
Jingchun Gao authored
Signed-off-by:
Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by:
Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Co-authored-by:
Jingchun Gao <gaojingchun1@huawei.com>
-
Laith Sakka authored
Signed-off-by:Laith Sakka <lsakka@meta.com>
-
Qiu authored
Signed-off-by:
QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Hubert de La Jonquiere authored
Signed-off-by:hdlj-h <hubert@hcompany.ai>
-
Alexander Matveev authored
Signed-off-by:Alexander Matveev <amatveev@redhat.com>
-
- 04 Dec, 2025 19 commits
-
-
Laith Sakka authored
Signed-off-by:Laith Sakka <lsakka@meta.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Peng-YM authored
[Bugfix] Missing tokens in `return_token_ids` when tool parsers is enabled in streaming mode (#29074) Signed-off-by:Peng-YM <1048217874pengym@gmail.com>
-
Mercykid-bash authored
Signed-off-by:
Che Ruan <cr623@ic.ac.uk> Signed-off-by:
mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Signed-off-by:
Mercykid-bash <ruanche0218@gmail.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Che Ruan <cr623@ic.ac.uk> Co-authored-by:
mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Kuntai Du authored
[KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer (#29705) Signed-off-by:KuntaiDu <kuntai@uchicago.edu>
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
Tao Yun authored
Signed-off-by:
taoyun <1069423820@qq.com> Signed-off-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Yongtao Huang authored
Signed-off-by:Yongtao Huang <yongtaoh2022@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <noooop@126.com> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Chauncey authored
Signed-off-by:
chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by:
Chauncey <chaunceyjiang@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
Noa Neria authored
Signed-off-by:Noa Neria <noa@run.ai>
-
dtc authored
Signed-off-by:
Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by:
dtc <dtcccc@linux.alibaba.com> Co-authored-by:
Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
-
Arpit Khandelwal authored
Signed-off-by:
arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-