ml/backend/ggml/ggml.go · f50d691254e671e69975c4e54fc4d0469b538f10 · OpenDAS / ollama

ggml: Fix memory leak on input tensors · f50d6912

Jesse Gross authored Apr 08, 2025

For every forward pass through the model, we need to allocate input
tensors: tokens, images, positions, outputs and masks. These get
allocated in system memory.

However, when we close the context that the tensors were allocated
through, the metadata gets freed but the actual backend memory does
not. This results in a significant memory leak.

This makes it so that all the memory allocated through a context
gets freed when it is closed.

Fixes #10040

f50d6912

ggml.go 28.2 KB

Replace ggml.go