server/sched.go · ea7657b54a000b9cf381e6e83463f50aaa40a161 · OpenDAS / ollama

sched: Add support for grouping GPUs (#10678) · ea7657b5

Daniel Andersen authored Aug 11, 2025

This patch modifies Ollama to allow grouping GPUs to memory-fit to the requested model, instead of the former algorithm of using one GPU distributing over all available GPUs.

Benefits:
 - Lower amount of (PCIe-)bus communication between GPUs - especially when they are not very high speed
 - Allowing unallocated GPUs to get into power-saving mode.
 - Significantly reduce VRAM allocation when using more than 2 GPUs in a system
 - Due to the reduced memory allocation, you can run more models simultaneously.

ea7657b5

sched.go 31.7 KB

Replace sched.go