Commits · d5a0d8d904baaf66a5326463a409fe4fa09b2dd2 · OpenDAS / ollama

14 Aug, 2025 1 commit

Jesse Gross authored May 29, 2025

This changes the memory allocation strategy from upfront estimation to
tracking actual allocations done by the engine and reacting to that. The
goal is avoid issues caused by both under-estimation (crashing) and
over-estimation (low performance due to under-utilized GPUs).

It is currently opt-in and can be enabled for models running on the
Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other
cases is unchanged and will continue to use the existing estimates.

d5a0d8d9

08 May, 2025 1 commit
- lint: enable usetesting, disable tenv (#10594) · 6e9a7a25
  Michael Yang authored May 08, 2025
  
  6e9a7a25
17 Dec, 2024 2 commits

llm: do not error on "null" format (#8139) · 2ddc32d5
Blake Mizerany authored Dec 17, 2024
```
This fixes another regression in the previous commit that fixed other
known bugs.
```
2ddc32d5

llm: do not silently fail for supplied, but invalid formats (#8130) · 87f0a49f

Blake Mizerany authored Dec 16, 2024

Changes in #8002 introduced fixes for bugs with mangling JSON Schemas.
It also fixed a bug where the server would silently fail when clients
requested invalid formats. It also, unfortunately, introduced a bug
where the server would reject requests with an empty format, which
should be allowed.

The change in #8127 updated the code to allow the empty format, but also
reintroduced the regression where the server would silently fail when
the format was set, but invalid.

This commit fixes both regressions. The server does not reject the empty
format, but it does reject invalid formats. It also adds tests to help
us catch regressions in the future.

Also, the updated code provides a more detailed error message when a
client sends a non-empty, but invalid format, echoing the invalid format
in the response.

This commits also takes the opportunity to remove superfluous linter
checks.

87f0a49f