- 02 Apr, 2026 1 commit
-
-
Li, Jiang authored
Signed-off-by:jiang1.li <jiang1.li@intel.com>
-
- 01 Apr, 2026 13 commits
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Jeffrey Wang authored
Signed-off-by:Jeffrey Wang <jeffreywang@anyscale.com>
-
yzong-rh authored
Signed-off-by:
Yifan <yzong@redhat.com> Signed-off-by:
Yifan Zong <yzong@redhat.com>
-
Chauncey authored
Signed-off-by:chaunceyjiang <chaunceyjiang@gmail.com>
-
yzong-rh authored
Signed-off-by:
Yifan <yzong@redhat.com> Co-authored-by:
Nicolò Lucchesi <nlucches@redhat.com>
-
Elvir Crnčević authored
[Bugfix] Revert "Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding" (#38359) Signed-off-by:
Elvir Crncevic <elvircrn@gmail.com> Co-authored-by:
Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
Lukas Geiger authored
Signed-off-by:Lukas Geiger <lukas.geiger94@gmail.com>
-
Li, Jiang authored
Signed-off-by:jiang1.li <jiang1.li@intel.com>
-
Jeffrey Wang authored
Signed-off-by:Jeffrey Wang <jeffreywang@anyscale.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Samu Tamminen authored
Signed-off-by:
Samu Tamminen <stammine@amd.com> Co-authored-by:
Tuukka Sarvi <tuukka.sarvi@amd.com>
-
Yifan Qiao authored
Signed-off-by:
Yifan Qiao <yifanqiao@berkeley.edu> Signed-off-by:
Yifan Qiao <yifanqiao@inferact.ai>
-
- 31 Mar, 2026 11 commits
-
-
Stig-Arne Grönroos authored
Signed-off-by:
Stig-Arne Grönroos <stig-arne.gronroos@amd.com> Signed-off-by:
Stig-Arne Grönroos <sgronroo@amd.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com>
-
Vedant V Jhaveri authored
Signed-off-by:
Vedant Jhaveri <vjhaveri@linkedin.com> Co-authored-by:
Vedant Jhaveri <vjhaveri@linkedin.com> Co-authored-by:
Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Olya Kozlova authored
Signed-off-by:Olya Kozlova <okozlova@nvidia.com>
-
BadrBasowid authored
Signed-off-by:
BadrBasowid <badr.basowid@gmail.com> Co-authored-by:
vllmellm <vllm.ellm@embeddedllm.com>
-
Matthew Bonanni authored
Signed-off-by:
SandishKumarHN <sandishkumarhn@gmail.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
SandishKumarHN <sandishkumarhn@gmail.com>
-
wliao2 authored
Signed-off-by:
Liao, Wei <wei.liao@intel.com> Signed-off-by:
wliao2 <wei.liao@intel.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Kfir Toledo authored
[kv_offload+HMA] Fix num_blocks with different per-layer page sizes and improve assert message (#38554) Signed-off-by:
Kfir Toledo <kfir.toledo@ibm.com> Co-authored-by:
Or Ozeri <oro@il.ibm.com>
-
Martin Hickey authored
Signed-off-by:Martin Hickey <martin.hickey@ie.ibm.com>
-
sungsoo ha authored
Signed-off-by:
Sungsoo Ha <sungsooh@nvidia.com> Co-authored-by:
Claude Opus 4.6 <noreply@anthropic.com>
-
- 30 Mar, 2026 8 commits
-
-
Prathmesh Bhatt authored
Signed-off-by:Prathmesh Bhatt <71340361+Prathmesh234@users.noreply.github.com>
-
SandishKumarHN authored
Signed-off-by:
SandishKumarHN <sandish@fb.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com>
-
Benjamin Chislett authored
Signed-off-by:Benjamin Chislett <bchislett@nvidia.com>
-
fangyuchu authored
Signed-off-by:
fangyuchu <fangyuchu@qq.com> Signed-off-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
Nick Hill <nickhill123@gmail.com>
-
Chendi.Xue authored
[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467) Signed-off-by:
Chendi Xue <chendi.xue@intel.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Chendi.Xue <chendi.xue@intel.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Nicolò Lucchesi <nlucches@redhat.com>
-
Li, Jiang authored
Signed-off-by:jiang1.li <jiang1.li@intel.com>
-
Collin McCarthy authored
Signed-off-by:
Collin McCarthy <cmccarthy@nvidia.com> Signed-off-by:
Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by:
Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
-
Nicolò Lucchesi authored
[Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270) Signed-off-by:NickLucche <nlucches@redhat.com>
-
- 29 Mar, 2026 3 commits
-
-
Kyle Sayers authored
Signed-off-by:Kyle Sayers <kylesayrs@gmail.com>
-
Wentao Ye authored
[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139) Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
- 27 Mar, 2026 3 commits
-
-
Giancarlo Delfin authored
Signed-off-by:Giancarlo Delfin <gdelfin@inferact.ai>
-
Or Ozeri authored
Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
Bvicii authored
Signed-off-by:
Bvicii <yizhanhuang2002@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
- 26 Mar, 2026 1 commit
-
-
yzong-rh authored
Signed-off-by:Yifan Zong <yzong@redhat.com>
-