Commits · 8bf38552def7cb2dcc9504035a8d5f5919f04da0 · OpenDAS / ollama

11 Nov, 2025 11 commits

llm: Prefer dedicated GPUs over iGPUs when allocating memory · 8bf38552

Jesse Gross authored Nov 04, 2025

We currently assign model layers to GPUs according to free VRAM,
which assumes that GPU performance is roughly equal. This does not
work well for mixed dGPU and iGPU systems because iGPUs typically
use system memory which is large but their performance is slow.
This instead assigns layers to dGPUs first and then iGPUs.

In the future, this could be generalized to have a more fine grained
notion of GPU performance but dGPU vs. iGPU performance is the most
extreme.

8bf38552

llm: Separate llamaServer and ollamaServer code paths · b13fbad0

Jesse Gross authored Nov 06, 2025

Originally, llamaServer represented old memory estimates, which
could be used with either the old or new engine. ollamaServer was
used only for the new estimates and new engine. Since these
implementations did not map directly to engine, there was engine-
specific code in common code paths.

Now that new estimates are always used for the new engine, there is
a direct mapping between server type and engine. This separates out
most of the engine-specific code into the correct implementation
to make things easier to understand.

b13fbad0

llm: Use Ollama engine memory layouts for both old and new engines · f560bd07

Jesse Gross authored Nov 05, 2025

Currently for both the old and new engines, there is code to
calculate how much memory is required for a model and lay out
the layers onto GPUs. This reuses the new engine's lay out code
for the old engine as well, bringing them closer together. The
old engine continues to use its current method of estimating
required memory.

This reduces maintainence effort and improves consistency, as new
features only need to be implemented in one place. The newer code
is also more accurate, especially with multiple GPUs.

f560bd07

llamarunner: Respect device ordering for offloaded layers · 4372d0bf

Jesse Gross authored Nov 10, 2025

We used to control the way that llama.cpp saw devices using
CUDA_VISIBLE_DEVICES or similar. This would ensure that the layers
offloaded to a device were actually the ones intended. This is
particularly important because we might reorder devices based on
free memory or performance.

When we started explicitly scheduling layers, this logic went
away but the llamarunner didn't have any way to set the correct
order of devices. This meant that the correct number of layers
would be assigned to a device but not necessarily the layers
that were expected. This change sets up the devices correctly
based on the offload information.

4372d0bf

app/ui: do not send thinking to prevent errors with cloud provider · 31361c4d
Eva H authored Nov 11, 2025

31361c4d

server: add logprobs and top_logprobs support to Ollama's API (#12899) · 59241c5b

Baptiste Jamin authored Nov 11, 2025



Adds logprobs support to Ollama's API including support for Ollama's
OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
in the API, Ollama will return the log probabilities for each token generated.
'top_logprobs', an integer value can also be specified up to the value 20.
When specified, the API will also provide the number of most likely tokens to
return at each token position
Co-authored-by: Baptiste Jamin <baptiste@crisp.chat>

59241c5b

address comment · 2a9b61f0
Eva Ho authored Nov 11, 2025

2a9b61f0
docs: fix metal gpu section header (#13045) · 6df42088
Sheikh authored Nov 11, 2025

6df42088
fix test · 9d615cda
Eva Ho authored Nov 10, 2025

9d615cda
clean up · 6a818b8a
Eva Ho authored Nov 10, 2025

6a818b8a
app/ui: do not send to prevent errors with cloud provider · 2aaf29ac
Eva Ho authored Nov 10, 2025

2aaf29ac

10 Nov, 2025 1 commit
- app/ui: using streamdown AI elements for markdown rendering · a42f826a
  Eva H authored Nov 10, 2025
  
  a42f826a
08 Nov, 2025 3 commits
- app/docs: remove out of date storybook instructions (#13006) · e10a3533
  Bruce MacDonald authored Nov 08, 2025
  
  e10a3533
- bugfix: don't include both consolidated.safetensors and model-*.safetensors (#13010) · 91ec3ddb
  Patrick Devine authored Nov 07, 2025
  
  91ec3ddb
- docs: update n8n URL for Ollama (#12994) · 755ac3b0
  Parth Sareen authored Nov 07, 2025
  
  755ac3b0
07 Nov, 2025 2 commits
- doc: re-add login autostart faq and GPU updates (#12975) · 60b89735
  Daniel Hiltgen authored Nov 07, 2025
```
* doc: re-add login autostart faq

This appears to have been accidentally dropped during the doc migration.

* docs: GPU updates lost on the doc update

* review comments: improve windows login disable instructions
```
  60b89735
- docs: fix 404 link to modelfile documentation (#12996) · d2ef679d
  Tomoya Fujita authored Nov 08, 2025
  
  d2ef679d
06 Nov, 2025 15 commits
- Remove unnecessary MacOs 13 and lower Patches (#12656) · d4e0da08
  Thomas Stocker authored Nov 07, 2025
```
* Remove unnecessary macos 13 Patch

* Remove unnecessary MacOs Version Guard patch

* rename patchesw

* remove again macos13 patch

* rename files
```
  d4e0da08
- openai: fix tool call ID mapping (#12988) · 565b802a
  Jeffrey Morgan authored Nov 06, 2025
  
  565b802a
- readme: add security tools section and Ollama fortress to community integrations (#12981) · 6c79e6c0
  Saifeddine ALOUI authored Nov 07, 2025
  
  6c79e6c0
- server: fix duplicate 'is' typo in comment (#12985) · 780762f9
  breatn authored Nov 06, 2025
  
  780762f9
- api: add omitempty to required tool function parameter type (#12989) · 30fcc719
  Jeffrey Morgan authored Nov 06, 2025
  
  30fcc719
- address comment · 3501a4bd
  Eva Ho authored Nov 06, 2025
  
  3501a4bd
- Merge pull request #12973 from macarronesc/main · 73a0cafc
  Eva H authored Nov 06, 2025
```
feat: add support for WebP images in Ollama's app
```
  73a0cafc
- address comments · e309c804
  Eva Ho authored Nov 06, 2025
  
  e309c804
- ggml update to b6840 (#12791) · 544b6739
  Daniel Hiltgen authored Nov 06, 2025
  
  544b6739
- refactor: remove GIF support from image validation tests and logging · a4a53692
  Daniel Alejandro Coll Tejeda authored Nov 06, 2025
  
  a4a53692
- readme: remove 404 link (#11351) · c4ba257c
  7394112478 authored Nov 06, 2025
  
  c4ba257c
- readme: add hle-eval-ollama to list of terminal community integrations (#11371) · 342e58ce
  mags0ft authored Nov 06, 2025
  
  342e58ce
- readme: add lollms and lollms WebUI to community integrations (#11981) · 47b2585c
  Saifeddine ALOUI authored Nov 06, 2025
  
  47b2585c
- app: fix macOS file picker to support Uniform Type Identifiers (#12965) · 4111db01
  Vincent Koc authored Nov 05, 2025
  
  4111db01
- address comment · 536c987c
  Eva Ho authored Nov 05, 2025
  
  536c987c
05 Nov, 2025 8 commits

fixing thinking not scrolling issue · a534d4e9
Eva Ho authored Nov 05, 2025

a534d4e9
address comments · 74586aa9
Eva Ho authored Nov 05, 2025

74586aa9
ui: using streamdown AI elements for markdown rendering · 8c74f5dd
Eva Ho authored Nov 04, 2025

8c74f5dd
ci: re-enable signing (#12974) · 80d34260
Daniel Hiltgen authored Nov 05, 2025

80d34260
feat: add support for WebP images in Ollama's app · bddfa210
Daniel Alejandro Coll Tejeda authored Nov 05, 2025

bddfa210

embeddings: added embedding command for cl (#12795) · 1ca608bc

nicole pardal authored Nov 05, 2025

Co-authored-by: A-Akhil <akhilrahul70@gmail.com>

This PR introduces a new ollama embed command that allows users to generate embeddings directly from the command line.

Added ollama embed MODEL [TEXT...] command for generating text embeddings
Supports both direct text arguments and stdin piping for scripted workflows

Outputs embeddings as JSON arrays (one per line)

1ca608bc

mac: fix stale VRAM data (#12972) · 6aa72830

Daniel Hiltgen authored Nov 05, 2025

The scheduler updates free VRAM based on current loaded models. This was
mutating the persisted list of GPUs, and when coupled with the non-refreshing
logic for Metal that lead to stale low VRAM reporting after unload. The fix is
to make sure the GPU discovery always returns a copy so the schedulers GPU list
is in fact ephemeral and doesn't leak any temporary adjustments back into the
persistent list.

6aa72830

bugfix: show connection string for interactive cli usage (#12930) · f89fc1ca
Patrick Devine authored Nov 05, 2025

f89fc1ca