"docs/vscode:/vscode.git/clone" did not exist on "d3e4460b98b7dfa8205a3c09c3bfc5a36663a2a0"
Commit 34c3b68f authored by Jesse Gross's avatar Jesse Gross Committed by Jesse Gross
Browse files

ggml: Don't allocate CPU buffers as CUDA Host buffers

Allocating (and in particular, freeing) memory from CUDA host buffers
is expensive and can cause a significant performance hit if we do
it for every token. Using normal system memory avoids this issue
and also gives the OS more flexibility to manage it.

There is no performance impact from this patch directly (either
positive or negative) but it makes a difference once we start
freeing memory correctly.
parent f33ccd5d
......@@ -384,12 +384,6 @@ func New(ctx context.Context, r *os.File, params ml.BackendParams) (ml.Backend,
for _, d := range append(gpus, append(accels, cpus...)...) {
b := C.ggml_backend_dev_init(d, nil)
bt := C.ggml_backend_get_default_buffer_type(b)
if d := C.ggml_backend_get_device(b); C.ggml_backend_dev_type(d) == C.GGML_BACKEND_DEVICE_TYPE_CPU && len(gpus) > 0 {
// use the first gpu host buffer type for gpu if possible
if hbt := C.ggml_backend_dev_host_buffer_type(gpus[0]); hbt != nil {
bt = hbt
}
}
deviceBufferTypes[d] = bt
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment