Unverified Commit fe5b9bb2 authored by Devon Rifkin's avatar Devon Rifkin Committed by GitHub
Browse files

lower default num parallel to 2

this is in part to "pay" for #10452, which doubled the default context length. The combination isn't fully neutral though, because even though the old 4x2k limit and the new 2x4k limit are memory equivalent, the 1x fallback is larger with 4k
parent 6ec71d8f
...@@ -58,7 +58,7 @@ var defaultModelsPerGPU = 3 ...@@ -58,7 +58,7 @@ var defaultModelsPerGPU = 3
// Default automatic value for parallel setting // Default automatic value for parallel setting
// Model will still need to fit in VRAM. If this setting won't fit // Model will still need to fit in VRAM. If this setting won't fit
// we'll back off down to 1 to try to get it to fit // we'll back off down to 1 to try to get it to fit
var defaultParallel = 4 var defaultParallel = 2
var ErrMaxQueue = errors.New("server busy, please try again. maximum pending requests exceeded") var ErrMaxQueue = errors.New("server busy, please try again. maximum pending requests exceeded")
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment