- 09 Dec, 2025 7 commits
-
-
Or Ozeri authored
Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
czhu-cohere authored
Signed-off-by:czhu-cohere <conway.zhu@cohere.com>
-
gnovack authored
Signed-off-by:gnovack <gnovack@amazon.com>
-
Yanan Cao authored
Signed-off-by:Yanan Cao <gmagogsfm@gmail.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Victor Ziliang Peng authored
Signed-off-by:Ziliang Peng <ziliang@character.ai>
-
Ming Yang authored
Signed-off-by:Ming Yang <minos.future@gmail.com>
-
- 08 Dec, 2025 8 commits
-
-
Charlie Fu authored
Signed-off-by:
charlifu <charlifu@amd.com> Signed-off-by:
Charlie Fu <Charlie.Fu@amd.com>
-
roikoren755 authored
Signed-off-by:Roi Koren <roik@nvidia.com>
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
Laith Sakka authored
Signed-off-by:Laith Sakka <lsakka@meta.com>
-
Daniel Cámpora authored
Signed-off-by:
Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
wang.yuqi authored
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249) Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
wang.yuqi authored
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686) Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
daniel-salib authored
Signed-off-by:Daniel Salib <danielsalib@meta.com>
-
- 07 Dec, 2025 9 commits
-
-
ElizaWszola authored
Signed-off-by:
ElizaWszola <ewszola@redhat.com> Signed-off-by:
yewentao256 <zhyanwentao@126.com> Co-authored-by:
yewentao256 <zhyanwentao@126.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
Jinzhen Lin authored
Signed-off-by:
Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com>
-
Cyrus Leung authored
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
jeremyteboul authored
Signed-off-by:
Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by:
Jeremy Teboul <jeremyteboul@fb.com>
-
Yanan Cao authored
Signed-off-by:Yanan Cao <gmagogsfm@gmail.com>
-
- 06 Dec, 2025 8 commits
-
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Viacheslav authored
Signed-off-by:Viacheslav Barinov <viacheslav.teh@gmail.com>
-
Yu Jiaqi authored
Signed-off-by:piood <2477084691@qq.com>
-
rasmith authored
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985) Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
TJian <tunjian.tan@embeddedllm.com>
-
rasmith authored
[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109) Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
Samuel Shen authored
Signed-off-by:
Samuel Shen <slshen@uchicago.edu> Co-authored-by:
Samuel Shen <slshen@uchicago.edu>
-
Deboleina authored
Signed-off-by:Debolina Roy <debroy@redhat.com>
-
- 05 Dec, 2025 8 commits
-
-
Divakar Verma authored
Signed-off-by:Divakar Verma <divakar.verma@amd.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Ilya Markov authored
Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
ilmarkov <markovilya197@gmail.com> Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
ProExpertProg <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com>
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Nick Hill authored
Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by:Nick Hill <nhill@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:
NickLucche <nlucches@redhat.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-