• Jesse Gross's avatar
    llm: Consistently track unassigned model data · a2cc8571
    Jesse Gross authored
    In some cases, if we fail to assign a piece of the model to a GPU then
    we lose track of this data. Although it doesn't change the memory
    allocation, it does affect the total size of the model reported by
    tools such as ollama ps (and also the percent offloaded).
    
    This makes it look like setting num_gpu isn't reflected in ollama ps,
    which isn't true but the offloading percent may appear to not change.
    
    Spreading the model across more GPUs will continue to impact the
    reported total size of the model.
    a2cc8571
memory.go 12.1 KB