• Jesse Gross's avatar
    llm: New memory management · d5a0d8d9
    Jesse Gross authored
    This changes the memory allocation strategy from upfront estimation to
    tracking actual allocations done by the engine and reacting to that. The
    goal is avoid issues caused by both under-estimation (crashing) and
    over-estimation (low performance due to under-utilized GPUs).
    
    It is currently opt-in and can be enabled for models running on the
    Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other
    cases is unchanged and will continue to use the existing estimates.
    d5a0d8d9
amd_linux.go 17.2 KB