- 25 Mar, 2025 1 commit
-
-
Lu Fang authored
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160) Signed-off-by:
Lu Fang <lufang@fb.com> Co-authored-by:
Richard Barnes <rbarnes@meta.com>
-
- 28 Jan, 2025 1 commit
-
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
- 27 Nov, 2024 1 commit
-
-
Tyler Michael Smith authored
Signed-off-by:Tyler Michael Smith <tyler@neuralmagic.com>
-
- 20 Nov, 2024 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkinson@neuralmagic.com>
-
- 17 Oct, 2024 1 commit
-
-
bnellnm authored
-
- 04 Oct, 2024 1 commit
-
-
Lucas Wilkinson authored
-
- 02 Aug, 2024 1 commit
-
-
Lucas Wilkinson authored
-
- 31 Jul, 2024 1 commit
-
-
HandH1998 authored
-
- 30 Jul, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 21 Jul, 2024 1 commit
-
-
Alexander Matveev authored
-
- 18 Jun, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 14 Jun, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 09 Jun, 2024 1 commit
-
-
bnellnm authored
-
- 31 May, 2024 2 commits
-
-
Simon Mo authored
Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149)
-
Alexander Matveev authored
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5) (#5136)
-
- 23 May, 2024 1 commit
-
-
Alexander Matveev authored
-
- 22 May, 2024 1 commit
-
-
Michael Goin authored
-
- 16 May, 2024 1 commit
-
-
Alexander Matveev authored
Co-authored-by:Robert Shaw <rshaw@neuralmagic.com>
-
- 24 Apr, 2024 1 commit
-
-
alexm-nm authored
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
-
- 01 Mar, 2024 1 commit
-
-
Robert Shaw authored
Co-authored-by:
Robert Shaw <114415538+rib-2@users.noreply.github.com> Co-authored-by:
alexm <alexm@neuralmagic.com>
-