• Jesse Gross's avatar
    ggml: Avoid cudaMemsetAsync during memory fitting · 392a2702
    Jesse Gross authored
    We pass invalid pointers when we check the size of the required
    compute graph before fitting. Some CUDA APIs validate these pointers
    but we can just skip them during this phase. cudaMemsetAsync is one
    of these that we weren't skipping but never took the code path that
    used it. Now that we have enabled op_offload, we can hit it in
    memory pressured situations.
    392a2702
0022-ggml-No-alloc-mode.patch 26 KB