llm: prevent loading too large models on windows (#5926)

Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.

llm: prevent loading too large models on windows (#5926)
Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.
25906d72 · Daniel Hiltgen · GitHub · 023451ce · 25906d72
Unverified Commit 25906d72 authored Aug 11, 2024 by Daniel Hiltgen Committed by GitHub Aug 11, 2024
Show whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

llm/server.go llm/server.go +3 -2

No files found.
--- a/llm/server.go
+++ b/llm/server.go
@@ -125,8 +125,9 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
 		}
 	}
-	// On linux, over-allocating CPU memory will almost always result in an error
+	// On linux and windows, over-allocating CPU memory will almost always result in an error
-	if runtime.GOOS == "linux" {
+	// Darwin has fully dynamic swap so has no direct concept of free swap space
+	if runtime.GOOS != "darwin" {
 		systemMemoryRequired := estimate.TotalSize - estimate.VRAMSize
 		available := systemFreeMemory + systemSwapFreeMemory
 		if systemMemoryRequired > available {