- 05 Dec, 2025 1 commit
-
-
Nick Hill authored
Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 03 Dec, 2025 2 commits
-
-
Yong Hoon Shin authored
Signed-off-by:Yong Hoon Shin <yhshin@meta.com>
-
Arpit Khandelwal authored
Signed-off-by:
arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
- 30 Nov, 2025 1 commit
-
-
Vensen authored
Signed-off-by:
vensen <vensenmu@gmail.com> Co-authored-by:
TJian <tunjian.tan@embeddedllm.com>
-
- 26 Nov, 2025 1 commit
-
-
Lucas Wilkinson authored
-
- 21 Nov, 2025 2 commits
-
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 20 Nov, 2025 1 commit
-
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
- 19 Nov, 2025 2 commits
-
-
Julien Denize authored
Signed-off-by:Julien Denize <julien.denize@mistral.ai>
-
Qiu authored
Signed-off-by:
QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by:
FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by:
LookAround <lixushi@huawei.com> Signed-off-by:
Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by:
zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by:
FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by:
LookAround <lixushi@huawei.com> Co-authored-by:
Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by:
zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by:
Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
-
- 17 Nov, 2025 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 16 Nov, 2025 1 commit
-
-
Lucia Fang authored
Signed-off-by:
Lu Fang <fanglu@fb.com> Signed-off-by:
github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by:
Lucia Fang <fanglu@fb.com> Signed-off-by:
Lucia Fang <116399278+luccafong@users.noreply.github.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
- 14 Nov, 2025 1 commit
-
-
rasmith authored
Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
- 12 Nov, 2025 1 commit
-
-
Chenguang Zheng authored
Signed-off-by:
n00909098 <nguyen.kha.long@huawei.com> Signed-off-by:
knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by:
herotai214 <herotai214@gmail.com> Signed-off-by:
Khuong Le <khuong.le.manh@huawei.com> Signed-off-by:
Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by:
n00909098 <nguyen.kha.long@huawei.com> Co-authored-by:
knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by:
herotai214 <herotai214@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Khuong Le <khuong.le.manh@huawei.com> Co-authored-by:
Khuong Le <lemanhkhuong2611@gmail.com>
-
- 08 Nov, 2025 1 commit
-
-
Benjamin Chislett authored
Signed-off-by:Benjamin Chislett <bchislett@nvidia.com>
-
- 07 Nov, 2025 2 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
- 06 Nov, 2025 1 commit
-
-
Dayeol Lee authored
-
- 05 Nov, 2025 1 commit
-
-
Ilya Markov authored
Signed-off-by:
ilmarkov <markovilya197@gmail.com> Signed-off-by:
Sage Moore <sage@neuralmagic.com> Co-authored-by:
Sage Moore <sage@neuralmagic.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
-
- 04 Nov, 2025 2 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 01 Nov, 2025 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 31 Oct, 2025 1 commit
-
-
GuanLuo authored
Signed-off-by:
Guan Luo <gluo@nvidia.com> Signed-off-by:
GuanLuo <41310872+GuanLuo@users.noreply.github.com> Signed-off-by:
Guan Luo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by:
Nicolò Lucchesi <nlucches@redhat.com>
-
- 30 Oct, 2025 1 commit
-
-
Ilya Markov authored
[Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices (#27564) Signed-off-by:
ilmarkov <markovilya197@gmail.com> Co-authored-by:
Tyler Michael Smith <tlrmchlsmth@gmail.com>
-
- 27 Oct, 2025 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 25 Oct, 2025 2 commits
-
-
Kuntai Du authored
Signed-off-by:
KuntaiDu <kuntai@uchicago.edu> Signed-off-by:
Kuntai Du <kuntai@uchicago.edu>
-
Zhuohan Li authored
-
- 24 Oct, 2025 1 commit
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 23 Oct, 2025 1 commit
-
-
Ilya Markov authored
[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) (#26709) Signed-off-by:
ilmarkov <markovilya197@gmail.com> Co-authored-by:
Tyler Michael Smith <tlrmchlsmth@gmail.com>
-
- 18 Oct, 2025 1 commit
-
-
iAmir97 authored
Signed-off-by:
iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by:
iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by:
iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
- 16 Oct, 2025 1 commit
-
-
Bram Wasti authored
Signed-off-by:
Bram Wasti <bwasti@meta.com> Co-authored-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com>
-
- 12 Oct, 2025 2 commits
-
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
wang.yuqi authored
[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank (#25867) Signed-off-by:wang.yuqi <noooop@126.com>
-
- 10 Oct, 2025 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 07 Oct, 2025 1 commit
-
-
Grant Holmes (Ren) authored
Signed-off-by:gholmes829 <g.holmes429@gmail.com>
-
- 05 Oct, 2025 1 commit
-
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
- 30 Sep, 2025 1 commit
-
-
David Ben-David authored
Signed-off-by:
David Ben-David <davidb@pliops.com> Co-authored-by:
David Ben-David <davidb@pliops.com>
-
- 25 Sep, 2025 1 commit
-
-
Tyler Michael Smith authored
Signed-off-by:Tyler Michael Smith <tyler@neuralmagic.com>
-
- 23 Sep, 2025 2 commits
-
-
kourosh hakhamaneshi authored
Signed-off-by:Kourosh Hakhamaneshi <kourosh@anyscale.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-