• Daniel Andersen's avatar
    sched: Add support for grouping GPUs (#10678) · ea7657b5
    Daniel Andersen authored
    This patch modifies Ollama to allow grouping GPUs to memory-fit to the requested model, instead of the former algorithm of using one GPU distributing over all available GPUs.
    
    Benefits:
     - Lower amount of (PCIe-)bus communication between GPUs - especially when they are not very high speed
     - Allowing unallocated GPUs to get into power-saving mode.
     - Significantly reduce VRAM allocation when using more than 2 GPUs in a system
     - Due to the reduced memory allocation, you can run more models simultaneously.
    ea7657b5
sched.go 31.7 KB