Commit ad6f6a1d authored by Jesse Gross's avatar Jesse Gross Committed by Jesse Gross
Browse files

llm: Change memory allocation backoff from exponential to incremental

If we create a memory layout that should fit based on report free VRAM
but allocation still fails, we start applying a backoff. This reduces
free VRAM by an exponential percentage (1%, 2%, 4%...). However, the
points chosen tend to be too dense at the beginning and too sparse at
the end. Therefore, this switches to an incremental backoff (10%, 20%,
30%...).
parent 6723a40b
...@@ -766,15 +766,12 @@ nextOperation: ...@@ -766,15 +766,12 @@ nextOperation:
// Memory allocation failed even though we created a layout that we thought should // Memory allocation failed even though we created a layout that we thought should
// fit in available memory. This could happen if either our free memory reports // fit in available memory. This could happen if either our free memory reports
// are incorrect or if available memory is changing between layout and allocation // are incorrect or if available memory is changing between layout and allocation
// time. Apply an exponential backoff to try to find the real amount of available // time. Apply a backoff to try to find the real amount of available space.
// space.
if backoff > 1 { if backoff > 1 {
slog.Warn("memory layout cannot be allocated", "memory", resp.Memory) slog.Warn("memory layout cannot be allocated", "memory", resp.Memory)
return nil, errors.New("memory layout cannot be allocated") return nil, errors.New("memory layout cannot be allocated")
} else if backoff == 0 {
backoff = 0.01
} else { } else {
backoff *= 2 backoff += 0.1
} }
slog.Info("model layout did not fit, applying backoff", "backoff", fmt.Sprintf("%.2f", backoff)) slog.Info("model layout did not fit, applying backoff", "backoff", fmt.Sprintf("%.2f", backoff))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment