llama/patches/0026-ggml-Backport-scale-kernel-fixes.patch · efaee8c2d658f7f40a2f44b411ebfb25fcc198b0 · OpenDAS / ollama

ggml: Backport scale kernel fixes · efaee8c2

Jesse Gross authored Sep 23, 2025

The GGML scale kernel uses signed 32-bit ints to represent
the number of elements in the tensor. For large images,
mistral-small3.2 overflows this, triggering CUDA errors due
to negative arguments.

Currently, this can happen when the user passes a large image
to mistral-small3.2. However, with upcoming changes to reserve
CUDA memory, it happens every time mistral-small is loaded as
we reserve using a worst case batch.

This patch is part of an upstream GGML commit and should be removed
after GGML is updated past 0a1b398 "ggml: add ops for WAN video model
(cuda && cpu) (#15669)".

Fixes #10388

efaee8c2

0026-ggml-Backport-scale-kernel-fixes.patch 2.48 KB

Replace 0026-ggml-Backport-scale-kernel-fixes.patch