• Jesse Gross's avatar
    ggml: Fix memory leak on input tensors · f50d6912
    Jesse Gross authored
    For every forward pass through the model, we need to allocate input
    tensors: tokens, images, positions, outputs and masks. These get
    allocated in system memory.
    
    However, when we close the context that the tensors were allocated
    through, the metadata gets freed but the actual backend memory does
    not. This results in a significant memory leak.
    
    This makes it so that all the memory allocated through a context
    gets freed when it is closed.
    
    Fixes #10040
    f50d6912
ggml.go 28.2 KB