llm: Avoid integer underflow on llama engine memory layout
On the llama engine, when we compute the memory layout, we reserve a buffer to allow for some flexibility for incorrect estimates. This is subtracted from GPU free memory and on GPUs with limited memory, it may underflow. Fixes #13494
Showing
Please register or sign in to comment