- 06 Dec, 2025 19 commits
-
-
Viacheslav authored
Signed-off-by:Viacheslav Barinov <viacheslav.teh@gmail.com>
-
Chukwuma Nwaugha authored
Signed-off-by:Chukwuma Nwaugha <nwaughac@gmail.com>
-
Ye (Charlotte) Qi authored
Signed-off-by:Ye (Charlotte) Qi <yeq@meta.com>
-
Yu Jiaqi authored
Signed-off-by:piood <2477084691@qq.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
redwrasse authored
Signed-off-by:redwrasse <mail@redwrasse.io>
-
kx authored
Signed-off-by:
01267596 <xiongkai123@cmbchina.com> Co-authored-by:
01267596 <xiongkai123@cmbchina.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
rasmith authored
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985) Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
TJian <tunjian.tan@embeddedllm.com>
-
Rohan Potdar authored
Signed-off-by:Rohan138 <rohanpotdar138@gmail.com>
-
Peter Salas authored
Signed-off-by:Peter Salas <peter@fixie.ai>
-
Dongjie Zou authored
Signed-off-by:baonudesifeizhai <baonudesifeizhai@gmail.com>
-
yuttian1 authored
Signed-off-by:yuttian1 <yuttian@amd.com>
-
rasmith authored
[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109) Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
Harry Mellor authored
Better error when world size is larger than node and `distributed_executor_backend` is not set (#30140) Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Samuel Shen authored
Signed-off-by:
Samuel Shen <slshen@uchicago.edu> Co-authored-by:
Samuel Shen <slshen@uchicago.edu>
-
rasmith authored
[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151) Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
Deboleina authored
Signed-off-by:Debolina Roy <debroy@redhat.com>
-
- 05 Dec, 2025 21 commits
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Bangsheng Tang authored
-
Divakar Verma authored
Signed-off-by:Divakar Verma <divakar.verma@amd.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
Tova Movshovitz authored
Signed-off-by:
tovam <tovam@pliops.com> Signed-off-by:
Tova Movshovitz <tovam@pliops.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Ilya Markov authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
ilmarkov <markovilya197@gmail.com> Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
ProExpertProg <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com>
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Nick Hill authored
Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by:Nick Hill <nhill@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:
NickLucche <nlucches@redhat.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Angela Yi authored
Signed-off-by:angelayi <yiangela7@gmail.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Signed-off-by:
Andrew Xia <axia@meta.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Alec S authored
Signed-off-by:
Alec Solder <alecs@fb.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Alec Solder <alecs@fb.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Yi Liu authored
Signed-off-by:yiliu30 <yi4.liu@intel.com>
-
Elham authored
Signed-off-by:
Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal> Signed-off-by:
Elham Harirpoush <elham.harirpoush@arm.com> Co-authored-by:
Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Max Hu authored
Signed-off-by:
Max Hu <hyoung2991@gmail.com> Signed-off-by:
Max Hu <maxhu@nvidia.com> Co-authored-by:
Max Hu <maxhu@nvidia.com>
-
Zhiwei authored
Signed-off-by:ZhiweiYan-96 <zhiwei.yan@amd.com>
-