• Jesse Gross's avatar
    ggml: Preallocate CUDA pool memory · 3d0b1734
    Jesse Gross authored
    The GGML CUDA backend allocates additional memory for intermediate
    results during calculation. This memory isn't currently allocated
    during worst case graph reservation and therefore not included in
    scheduling. This means that as these buffers potentially grow
    with context length, we could crash.
    
    This extends the memory allocation system down layer from the GGML
    graph to the CUDA layer, preallocating the worst case memory there
    as well.
    
    Fixes #11753
    3d0b1734
0022-ggml-No-alloc-mode.patch 25.6 KB