• Jesse Gross's avatar
    ggml: Backport scale kernel fixes · efaee8c2
    Jesse Gross authored
    The GGML scale kernel uses signed 32-bit ints to represent
    the number of elements in the tensor. For large images,
    mistral-small3.2 overflows this, triggering CUDA errors due
    to negative arguments.
    
    Currently, this can happen when the user passes a large image
    to mistral-small3.2. However, with upcoming changes to reserve
    CUDA memory, it happens every time mistral-small is loaded as
    we reserve using a worst case batch.
    
    This patch is part of an upstream GGML commit and should be removed
    after GGML is updated past 0a1b398 "ggml: add ops for WAN video model
    (cuda && cpu) (#15669)".
    
    Fixes #10388
    efaee8c2
0026-ggml-Backport-scale-kernel-fixes.patch 2.48 KB