"tests/vscode:/vscode.git/clone" did not exist on "e02706d2d27c9af429adf89e7dec2b37e3ec39c1"
  • alexm-nm's avatar
    [Bugfix] Fix marlin kernel crash on H100 (#4218) · aae08249
    alexm-nm authored
    This PR addresses the Marlin kernel H100 crash that was reported here: neuralmagic#187.
    The reason for the crash was the inline PTX assembly that introduced the async_copy with streaming behavior. The solution is to use the more standard PTX for async_copy (without the fractional L2 policy for "evict_first"). There is no performance difference between standard async_copy PTX and the previous one.
    aae08249
marlin_cuda_kernel.cu 44.3 KB