- 16 Jan, 2026 1 commit
-
-
xiabo authored
vllm:export VLLM_CUSTOM_CACHE=1 dtk:export HIP_KERNEL_EVENT_SYSTENFENCE=1 2、kvcache支持fp8
-
- 01 Oct, 2025 1 commit
-
-
Yongye Zhu authored
Signed-off-by:
Chen Zhang <zhangch99@outlook.com> Signed-off-by:
youkaichao <youkaichao@gmail.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
mgoin <mgoin64@gmail.com> Signed-off-by:
NickLucche <nlucches@redhat.com> Signed-off-by:
Yongye Zhu <zyy1102000@gmail.com> Signed-off-by:
Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by:
Lucia Fang <fanglu@meta.com> Co-authored-by:
Chen Zhang <zhangch99@outlook.com> Co-authored-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
yewentao256 <zhyanwentao@126.com> Co-authored-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by:
mgoin <mgoin64@gmail.com> Co-authored-by:
Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by:
Lucia Fang <fanglu@meta.com> Co-authored-by:
NickLucche <nlucches@redhat.com> Co-authored-by:
Siyuan Fu <siyuanf@nvidia.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Xiaozhu Meng <mxz297@gmail.com> Co-authored-by:
Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by:
simon-mo <simon.mo@hey.com>
-
- 23 Sep, 2025 1 commit
-
-
rivos-shreeasish authored
Signed-off-by:Shreeasish Kumar <shreeasish@rivosinc.com>
-
- 17 Sep, 2025 1 commit
-
-
Aidyn-A authored
Signed-off-by:Aidyn-A <aidyn.b.aitzhan@gmail.com>
-
- 13 Sep, 2025 1 commit
-
-
elvischenv authored
Signed-off-by:elvischenv <219235043+elvischenv@users.noreply.github.com>
-
- 05 Aug, 2025 1 commit
-
-
Wentao Ye authored
Signed-off-by:
yewentao256 <zhyanwentao@126.com> Co-authored-by:
mgoin <mgoin64@gmail.com>
-
- 30 Jul, 2025 1 commit
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 26 Jul, 2025 1 commit
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 22 Jul, 2025 2 commits
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Mickaël Seznec authored
Signed-off-by:
Mickael Seznec <mickael@mistral.ai> Co-authored-by:
mgoin <mgoin64@gmail.com>
-
- 16 Jun, 2025 1 commit
-
-
Lu Fang authored
Signed-off-by:Lu Fang <lufang@fb.com>
-
- 12 Jun, 2025 1 commit
-
-
zhuwenwen authored
-
- 03 Jun, 2025 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
- 14 May, 2025 1 commit
-
-
xiabo authored
-
- 07 May, 2025 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
- 31 Mar, 2025 2 commits
-
-
Charlie Fu authored
Signed-off-by:charlifu <charlifu@amd.com>
-
zhuwenwen authored
-
- 15 Mar, 2025 1 commit
-
-
Lu Fang authored
Signed-off-by:Lu Fang <lufang@fb.com>
-
- 14 Mar, 2025 1 commit
-
-
Jeff Daily authored
Signed-off-by:Jeff Daily <jeff.daily@amd.com>
-
- 11 Mar, 2025 1 commit
-
-
Jeff Daily authored
Signed-off-by:Jeff Daily <jeff.daily@amd.com>
-
- 27 Feb, 2025 1 commit
-
-
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 authored
Signed-off-by:Hollow Man <hollowman@opensuse.org>
-
- 25 Feb, 2025 1 commit
-
-
Gregory Shtrasberg authored
-
- 20 Feb, 2025 1 commit
-
-
Gregory Shtrasberg authored
-
- 13 Dec, 2024 1 commit
-
-
Luka Govedič authored
Signed-off-by:
luka <luka@neuralmagic.com> Co-authored-by:
Varun Sundar Rabindranath <varun@neuralmagic.com>
-
- 08 Nov, 2024 1 commit
-
-
Luka Govedič authored
Signed-off-by:
luka <luka@neuralmagic.com> Co-authored-by:
youkaichao <youkaichao@126.com>
-
- 16 Oct, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 04 Oct, 2024 1 commit
-
-
Lucas Wilkinson authored
-
- 22 Aug, 2024 1 commit
-
-
Luka Govedič authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 16 Aug, 2024 1 commit
-
-
Charlie Fu authored
-
- 05 Aug, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 30 Jul, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 26 Jul, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 22 Jul, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 21 Jul, 2024 1 commit
-
-
Alexander Matveev authored
-
- 20 Jul, 2024 1 commit
-
-
Varun Sundar Rabindranath authored
Co-authored-by:Varun Sundar Rabindranth <varun@neuralmagic.com>
-
- 18 Jul, 2024 1 commit
-
-
Varun Sundar Rabindranath authored
Co-authored-by:Varun Sundar Rabindranath <varun@neuralmagic.com>
-
- 03 Jul, 2024 1 commit
-
-
Michael Goin authored
-
- 12 Jun, 2024 1 commit
-
-
Cody Yu authored
Inspired by #5146, this PR improves FP8 quantize kernel by vectorizing data transfer to better utilize memory bandwidth. Microbenchmark shows that this improved kernel can achieve 1.0x-1.5x speedup (especially when hidden size is large). In details, we applied 3 optimizations: - Use inverted scale so that most divisions are changed to multiplications. - Unroll the loop by 4 times to improve ILP. - Use vectorized 4 to transfer data between HBM and SRAM.
-
- 09 Jun, 2024 1 commit
-
-
bnellnm authored
-
- 22 May, 2024 1 commit
-
-
Michael Goin authored
-