- 28 Apr, 2024 1 commit
-
-
DefTruth authored
-
- 27 Apr, 2024 13 commits
-
-
Robert Shaw authored
-
Nick Hill authored
-
Prashant Gupta authored
Signed-off-by:
Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by:
Travis Johnson <tjohnson31415@gmail.com>
-
Nick Hill authored
Co-authored-by:DefTruth <31974251+deftruth@users.noreply.github.com>
-
Ruoyu Qin authored
-
Roy authored
-
Caio Mendes authored
-
Austin Veselka authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
Hongxia Yang authored
-
Roy authored
-
Cyrus Leung authored
-
Philipp Moritz authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Nick Hill authored
-
- 26 Apr, 2024 7 commits
-
-
youkaichao authored
Co-authored-by:Simon Mo <simon.mo@hey.com>
-
Cody Yu authored
-
SangBin Cho authored
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
Norman Mu authored
-
Cyrus Leung authored
-
Hongxia Yang authored
Co-authored-by:WoosukKwon <woosuk.kwon@berkeley.edu>
-
- 25 Apr, 2024 10 commits
-
-
Nick Hill authored
-
Nick Hill authored
-
Roy authored
-
SangBin Cho authored
-
Kunshang Ji authored
-
Caio Mendes authored
Co-authored-by:Caio Mendes <caiocesart@microsoft.com>
-
Alexei-V-Ivanov-AMD authored
-
Isotr0py authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Nick Hill authored
-
Caio Mendes authored
-
- 24 Apr, 2024 9 commits
-
-
zifeitong authored
-
youkaichao authored
Co-authored-by:Simon Mo <simon.mo@hey.com>
-
alexm-nm authored
This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187. The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
-
Roger Wang authored
-
youkaichao authored
[Core][Distributed] use existing torch.cuda.device context manager (#4318)
-
Woosuk Kwon authored
-
youkaichao authored
-
youkaichao authored
-
Robert Shaw authored
Fixes fp8 iterface which broke in AQLM merge.
-