• Jesse Gross's avatar
    llm: Change memory allocation backoff from exponential to incremental · ad6f6a1d
    Jesse Gross authored
    If we create a memory layout that should fit based on report free VRAM
    but allocation still fails, we start applying a backoff. This reduces
    free VRAM by an exponential percentage (1%, 2%, 4%...). However, the
    points chosen tend to be too dense at the beginning and too sparse at
    the end. Therefore, this switches to an incremental backoff (10%, 20%,
    30%...).
    ad6f6a1d
server.go 52.8 KB