- 10 Dec, 2025 4 commits
-
-
Lucas Wilkinson authored
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624) Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Benjamin Chislett <chislett.ben@gmail.com>
-
ElizaWszola authored
-
PatrykSaffer authored
Signed-off-by:
Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by:
PatrykSaffer <patryk.saffer@mistral.ai> Co-authored-by:
Patryk Saffer <patryk.saffer99@gmail.com>
-
dongbo910220 authored
Signed-off-by:dongbo910220 <1275604947@qq.com>
-
- 09 Dec, 2025 36 commits
-
-
Hashem Hashemi authored
Signed-off-by:Hashem Hashemi <hashem.hashemi@amd.com>
-
Charlie Fu authored
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693) Signed-off-by:
charlifu <charlifu@amd.com> Signed-off-by:
Micah Williamson <micah.williamson@amd.com> Signed-off-by:
Charlie Fu <Charlie.Fu@amd.com> Co-authored-by:
Micah Williamson <micah.williamson@amd.com> Co-authored-by:
wuhuikx <hattie.wu@amd.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
-
Kyle Sayers authored
Signed-off-by:Kyle Sayers <kylesayrs@gmail.com>
-
bnellnm authored
Signed-off-by:
Bill Nell <bnell@redhat.com> Signed-off-by:
Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Tyler Michael Smith <tlrmchlsmth@gmail.com>
-
rasmith authored
[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145)
-
dependabot[bot] authored
Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Tsukasa OI authored
Signed-off-by:Tsukasa OI <floss_llm@irq.a4lg.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com>
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Ilya Markov authored
Signed-off-by:ilmarkov <markovilya197@gmail.com>
-
Alexei-V-Ivanov-AMD authored
Signed-off-by:Alexei V. Ivanov <alexei.ivanov@amd.com>
-
Wentao Ye authored
[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897) Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
liuquan authored
Signed-off-by:
quanliu <18646313696@163.com> Co-authored-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com>
-
Julien Denize authored
Signed-off-by:juliendenize <julien.denize@mistral.ai>
-
vllmellm authored
Signed-off-by:vllmellm <vllm.ellm@embeddedllm.com>
-
Dongjie Zou authored
Signed-off-by:baonudesifeizhai <baonudesifeizhai@gmail.com>
-
haoyangli-amd authored
Signed-off-by:Haoyang Li <lihaoyang0109@gmail.com>
-
Hubert de La Jonquiere authored
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056)
-
Jaya Yuan authored
Signed-off-by:FENP <yuanyongjie.yyj@antgroup.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Micah Williamson authored
Signed-off-by:Micah Williamson <micah.williamson@amd.com>
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Benjamin Chislett <chislett.ben@gmail.com>
-
Yongtao Huang authored
Signed-off-by:Yongtao Huang <yongtaoh2022@gmail.com>
-
Tsukasa OI authored
[Model][Quantization] Restore MoE + GGUF models support (incl. Qwen3 MoE) by allowing Sideload Parameters (#30116) Signed-off-by:
Tsukasa OI <floss_llm@irq.a4lg.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn>
-
Fanli Lin authored
Signed-off-by:Lin, Fanli <fanli.lin@intel.com>
-
Fadi Arafeh authored
Signed-off-by:Fadi Arafeh <fadi.arafeh@arm.com>
-
liangel-02 authored
Signed-off-by:Angel Li <liangel@meta.com>
-
Or Ozeri authored
Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
Michael Goin authored
Signed-off-by:
mgoin <mgoin64@gmail.com> Signed-off-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
czhu-cohere authored
Signed-off-by:czhu-cohere <conway.zhu@cohere.com>
-
gnovack authored
Signed-off-by:gnovack <gnovack@amazon.com>
-