- 17 Dec, 2025 2 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Jialin Ouyang authored
Signed-off-by:Jialin Ouyang <Jialin.Ouyang@gmail.com>
-
- 16 Dec, 2025 4 commits
-
-
Roger Wang authored
Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Sun Kim <sunytokki@gmail.com>
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Stanislaw Wozniak <stw@zurich.ibm.com>
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
jiangkuaixue123 authored
Signed-off-by:
jiangkuaixue123 <jiangxiaozhou111@163.com> Co-authored-by:
root <root@hk01dgx028.cm.cluster>
-
- 15 Dec, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
- 12 Dec, 2025 2 commits
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
- 11 Dec, 2025 4 commits
-
-
Xingyu Liu authored
Signed-off-by:
Xingyu Liu <charlotteliu12x@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Martin Hickey authored
Signed-off-by:Martin Hickey <martin.hickey@ie.ibm.com>
-
Qiu authored
Signed-off-by:
QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 10 Dec, 2025 6 commits
-
-
shivampr authored
Signed-off-by:
Shivam <shivamprasad91@gmail.com> Signed-off-by:
shivampr <shivampr.dev@gmail.com> Co-authored-by:
Chen Zhang <zhangch99@outlook.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
Aditya Tewari authored
Signed-off-by:Aditya Tewari <aditya.tewari@arm.com>
-
Daniele authored
Signed-off-by:Daniele Trifirò <dtrifiro@redhat.com>
-
Wilson Wu authored
Signed-off-by:
Wilson Wu <iwilsonwu@gmail.com> Co-authored-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com>
-
Lucas Wilkinson authored
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624) Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Benjamin Chislett <chislett.ben@gmail.com>
-
- 09 Dec, 2025 6 commits
-
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com>
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Wentao Ye authored
[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897) Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
- 07 Dec, 2025 3 commits
-
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Cyrus Leung authored
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 06 Dec, 2025 2 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 05 Dec, 2025 3 commits
-
-
Ilya Markov authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
ilmarkov <markovilya197@gmail.com> Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
ProExpertProg <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com>
-
Nick Hill authored
Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by:Nick Hill <nhill@redhat.com>
-
Max Hu authored
Signed-off-by:
Max Hu <hyoung2991@gmail.com> Signed-off-by:
Max Hu <maxhu@nvidia.com> Co-authored-by:
Max Hu <maxhu@nvidia.com>
-
- 04 Dec, 2025 4 commits
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <noooop@126.com> Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
- 03 Dec, 2025 3 commits
-
-
Yong Hoon Shin authored
Signed-off-by:Yong Hoon Shin <yhshin@meta.com>
-
Arpit Khandelwal authored
Signed-off-by:
arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-