- 10 Dec, 2025 3 commits
-
-
rasmith authored
[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020) Signed-off-by:
Randall Smith <ransmith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Lucas Wilkinson authored
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624) Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Benjamin Chislett <chislett.ben@gmail.com>
-
- 09 Dec, 2025 21 commits
-
-
Charlie Fu authored
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693) Signed-off-by:
charlifu <charlifu@amd.com> Signed-off-by:
Micah Williamson <micah.williamson@amd.com> Signed-off-by:
Charlie Fu <Charlie.Fu@amd.com> Co-authored-by:
Micah Williamson <micah.williamson@amd.com> Co-authored-by:
wuhuikx <hattie.wu@amd.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
-
Kyle Sayers authored
Signed-off-by:Kyle Sayers <kylesayrs@gmail.com>
-
rasmith authored
[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145)
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Ilya Markov authored
Signed-off-by:ilmarkov <markovilya197@gmail.com>
-
Wentao Ye authored
[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897) Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
liuquan authored
Signed-off-by:
quanliu <18646313696@163.com> Co-authored-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com>
-
Julien Denize authored
Signed-off-by:juliendenize <julien.denize@mistral.ai>
-
Hubert de La Jonquiere authored
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056)
-
Jaya Yuan authored
Signed-off-by:FENP <yuanyongjie.yyj@antgroup.com>
-
Micah Williamson authored
Signed-off-by:Micah Williamson <micah.williamson@amd.com>
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Benjamin Chislett <chislett.ben@gmail.com>
-
Or Ozeri authored
Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
czhu-cohere authored
Signed-off-by:czhu-cohere <conway.zhu@cohere.com>
-
gnovack authored
Signed-off-by:gnovack <gnovack@amazon.com>
-
Yanan Cao authored
Signed-off-by:Yanan Cao <gmagogsfm@gmail.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Victor Ziliang Peng authored
Signed-off-by:Ziliang Peng <ziliang@character.ai>
-
Ming Yang authored
Signed-off-by:Ming Yang <minos.future@gmail.com>
-
- 08 Dec, 2025 8 commits
-
-
Charlie Fu authored
Signed-off-by:
charlifu <charlifu@amd.com> Signed-off-by:
Charlie Fu <Charlie.Fu@amd.com>
-
roikoren755 authored
Signed-off-by:Roi Koren <roik@nvidia.com>
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
Laith Sakka authored
Signed-off-by:Laith Sakka <lsakka@meta.com>
-
Daniel Cámpora authored
Signed-off-by:
Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
wang.yuqi authored
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249) Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
wang.yuqi authored
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686) Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
daniel-salib authored
Signed-off-by:Daniel Salib <danielsalib@meta.com>
-
- 07 Dec, 2025 8 commits
-
-
ElizaWszola authored
Signed-off-by:
ElizaWszola <ewszola@redhat.com> Signed-off-by:
yewentao256 <zhyanwentao@126.com> Co-authored-by:
yewentao256 <zhyanwentao@126.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
Jinzhen Lin authored
Signed-off-by:
Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com>
-
Cyrus Leung authored
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
jeremyteboul authored
Signed-off-by:
Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by:
Jeremy Teboul <jeremyteboul@fb.com>
-