• Jesse Gross's avatar
    llm: Avoid integer underflow on llama engine memory layout · 172b5924
    Jesse Gross authored
    On the llama engine, when we compute the memory layout, we reserve
    a buffer to allow for some flexibility for incorrect estimates.
    This is subtracted from GPU free memory and on GPUs with limited
    memory, it may underflow.
    
    Fixes #13494
    172b5924
server.go 55.1 KB