ggml: Avoid cudaMemsetAsync during memory fitting
We pass invalid pointers when we check the size of the required compute graph before fitting. Some CUDA APIs validate these pointers but we can just skip them during this phase. cudaMemsetAsync is one of these that we weren't skipping but never took the code path that used it. Now that we have enabled op_offload, we can hit it in memory pressured situations.
Showing
Please register or sign in to comment