Commits · 33ee7168ba1e16c813b52dc2c9417efa1e2e9f20 · OpenDAS / ollama

09 Jan, 2026 1 commit

Add experimental MLX backend and engine with imagegen support (#13648) · 33ee7168

Daniel Hiltgen authored Jan 08, 2026



* WIP - MLX backend with gemma3

* MLX: add cmake and go tag build toggles

To build the new MLX backend code:
  cmake --preset MLX
  cmake --build --preset MLX --parallel
  cmake --install build --component MLX
  go build -tags mlx .

Note: the main.go entrypoint for the MLX engine will change in a follow up commit.

* add experimental image generation runtime

* add experimental image generation runtime

* MLX: wire up cuda build for linux

* MLX: get dependencies correct and dedup

This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13
directory.

* fix relative link bug in dedup

* Add darwin build and readme

* add go build tag for mlx dependent code and wire up build_darwin.sh

* lint cleanup

* macos: build mlx for x86

This will be CPU only.

* cuda build instructions and fix drift from mlx bump

* stale comment

* Delete agent helper doc

* Clean up readme.md

* Revise README for tokenizer clarity and details

Updated README to clarify tokenizer functionality and removed correctness section.

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>

33ee7168

06 Jan, 2026 1 commit

preserve tool definition and call JSON ordering (#13525) · e51dead6

Devon Rifkin authored Jan 05, 2026

* preserve tool definition and call JSON ordering

This is another iteration of
<https://github.com/ollama/ollama/pull/12518>, but this time we've
simplified things by relaxing the competing requirements of being
compatible AND order-preserving with templates (vs. renderers). We
maintain backwards compatibility at the cost of not guaranteeing order
for templates. We plan on moving more and more models to renderers,
which have been updated to use these new data types, and additionally
we could add an opt-in way of templates getting an order-preserved list
(e.g., via sibling template vars)

* orderedmap_test: remove testify

e51dead6

03 Jan, 2026 1 commit

server: return error when embedding contains NaN or Inf values (#13599) · 37f6f3af

lif authored Jan 03, 2026



The normalize function now checks for NaN and Inf values in the
embedding vector before processing. This prevents JSON encoding
failures when models produce invalid floating-point values.

Fixes #13572
Signed-off-by: majiayu000 <1835304752@qq.com>

37f6f3af

18 Dec, 2025 3 commits
- add REQUIRES command to Modelfile (#13361) · 8852220f
  Jeffrey Morgan authored Dec 18, 2025
  
  8852220f
- Revert "Omit args and params in tool function def and calls (#13516)" (#13518) · 522c11a7
  Grace authored Dec 17, 2025
```
This reverts commit 0fadeffa.
```
  522c11a7
- Omit args and params in tool function def and calls (#13516) · 0fadeffa
  Grace authored Dec 17, 2025
  
  0fadeffa
16 Dec, 2025 1 commit

types: ConfigV2 and RootFS (#13504) · 45c47393

Bruce MacDonald authored Dec 16, 2025

Refactored the ConfigV2 and RootFS types from server/images.go to a new types/model/config.go file under the model package. Updated all references to use model.ConfigV2 and model.RootFS. This allows for use in other projects without worrying about compiling the c code in the llama package.

45c47393

11 Dec, 2025 2 commits
- openai: add v1/responses support (#13351) · 1eb5e759
  Devon Rifkin authored Dec 11, 2025
```
Only supporting the stateless part of the API.

Doc updates to come once this is shipped.

Closes: #9659
```
  1eb5e759
- routes: add logprobs in tool calls (#13238) · 1c4e85b4
  EasonLin authored Dec 11, 2025
  
  1c4e85b4
08 Dec, 2025 1 commit

truncation: fixed runner truncation logic + removed server truncation (#12839) · e082d60a

nicole pardal authored Dec 08, 2025

This PR consolidates all embedding prompt-length checking, truncation, and prompt token counting into the runner to ensure a single source of truth.

e082d60a

05 Dec, 2025 1 commit

fix(api): correct Content-Type header for /api/chat and /api/generate when... · 31b8c6a2

Sos Pogosyan authored Dec 05, 2025


fix(api): correct Content-Type header for /api/chat and /api/generate when using cloud models (#13279)

---------
Co-authored-by: Pogosyan Sos <sos_pogosyan@MacBook-Pro-Sos.local>
Co-authored-by: Patrick Devine <patrick@infrahq.com>

31b8c6a2

20 Nov, 2025 1 commit
- Parser for Cogito v2 (#13145) · d70e9355
  Grace authored Nov 19, 2025
  
  d70e9355
18 Nov, 2025 1 commit
- migrate to golangci-lint v2 (#13109) · 718961de
  Michael Yang authored Nov 18, 2025
```
* migrate to golangci-lint v2
* copyloopvar
```
  718961de
16 Nov, 2025 2 commits
- docs: fix typos in repository documentation (#10683) · dd0ed0ef
  omahs authored Nov 16, 2025
  
  dd0ed0ef
- server: clean up manifest documentation (#12995) · 4cea757e
  pierwill authored Nov 15, 2025
```
Co-authored-by: pierwill <pierwill@users.noreply.github.com>
```
  4cea757e
13 Nov, 2025 1 commit
- logprob: add bytes to logprobs (#13068) · c1149875
  Parth Sareen authored Nov 13, 2025
  
  c1149875
11 Nov, 2025 2 commits

llm: Use Ollama engine memory layouts for both old and new engines · f560bd07

Jesse Gross authored Nov 05, 2025

Currently for both the old and new engines, there is code to
calculate how much memory is required for a model and lay out
the layers onto GPUs. This reuses the new engine's lay out code
for the old engine as well, bringing them closer together. The
old engine continues to use its current method of estimating
required memory.

This reduces maintainence effort and improves consistency, as new
features only need to be implemented in one place. The newer code
is also more accurate, especially with multiple GPUs.

f560bd07

server: add logprobs and top_logprobs support to Ollama's API (#12899) · 59241c5b

Baptiste Jamin authored Nov 11, 2025



Adds logprobs support to Ollama's API including support for Ollama's
OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
in the API, Ollama will return the log probabilities for each token generated.
'top_logprobs', an integer value can also be specified up to the value 20.
When specified, the API will also provide the number of most likely tokens to
return at each token position
Co-authored-by: Baptiste Jamin <baptiste@crisp.chat>

59241c5b

06 Nov, 2025 2 commits
- server: fix duplicate 'is' typo in comment (#12985) · 780762f9
  breatn authored Nov 06, 2025
  
  780762f9
- api: add omitempty to required tool function parameter type (#12989) · 30fcc719
  Jeffrey Morgan authored Nov 06, 2025
  
  30fcc719
05 Nov, 2025 2 commits
- log: trace logging for scheduler (#12961) · 408c2f99
  Daniel Hiltgen authored Nov 05, 2025
  
  408c2f99
- Add Tool Call ID (#12956) · 809b9c68
  Grace authored Nov 04, 2025
```
* routes/types: add tool call id

---------
Co-authored-by: ParthSareen <parth.sareen@ollama.com>
```
  809b9c68
04 Nov, 2025 1 commit

app: add code for macOS and Windows apps under 'app' (#12933) · d3b4b997

Daniel Hiltgen authored Nov 04, 2025



* app: add code for macOS and Windows apps under 'app'

* app: add readme

* app: windows and linux only for now

* ci: fix ui CI validation

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>

d3b4b997

29 Oct, 2025 1 commit
- feat(model): add qwen3vl (#12665) · 7d25b9e1
  Michael Yang authored Oct 28, 2025
  
  7d25b9e1
28 Oct, 2025 1 commit
- Revert "server: Consolidate embedding truncation in runner (#12730)" (#12810) · 29f63f37
  Patrick Devine authored Oct 28, 2025
```
This reverts commit 5d347f6d.
```
  29f63f37
27 Oct, 2025 2 commits

create: inherit FROM model's renderer/parser · 1bdd8169

Devon Rifkin authored Oct 27, 2025

On main, the `RENDERER` and `PARSER` fields from the `Modelfile` don't
get propagated to a new model created with a `req.From` parameter. This
is easily triggered via `ollama run qwen3-coder`, then running some save
command like `/save qwen3-coder-custom`.

Added a regression test for this, and then open the config for the
"from" model in order to use its renderer/parser as a default for the
new model. This will fix the CLI and also API-based creates.

Fixes: https://github.com/ollama/ollama/issues/12792

1bdd8169

server: Consolidate embedding truncation in runner (#12730) · 5d347f6d

nicole pardal authored Oct 27, 2025

Currently, checking the length of prompts for embeddings to ensure
they fit in the context window (and possible truncation) occurs in
two places - the Ollama server and runner. This can lead to
inconsistencies in both the checks and reported number of tokens
processed. Since we have to do this processing in the runner, this
consolidates all of the logic there.

5d347f6d

25 Oct, 2025 1 commit
- cloud: set the proxy content-type to the same as local models (#12759) · b97eb2b8
  Patrick Devine authored Oct 25, 2025
  
  b97eb2b8
23 Oct, 2025 1 commit

DRY out the runner lifecycle code (#12540) · 3258a89b

Daniel Hiltgen authored Oct 23, 2025

* DRY out the runner lifecycle code

Now that discovery uses the runners as well, this unifies the runner spawning code
into a single place.  This also unifies GPU discovery types with the newer ml.DeviceInfo

* win: make incremental builds better

Place build artifacts in discrete directories so incremental builds don't have to start fresh

* Adjust sort order to consider iGPUs

* handle cpu inference oom scenarios

* review comments

3258a89b

22 Oct, 2025 1 commit
- cloud: don't error sending empty messages (#12724) · d515aed6
  Patrick Devine authored Oct 21, 2025
  
  d515aed6
20 Oct, 2025 1 commit
- fs(ggml): fill in arch prefix if necessary (#12646) · d2b63c19
  Michael Yang authored Oct 20, 2025
  
  d2b63c19
17 Oct, 2025 1 commit

test: harden scheduler tests (#12662) · 68e04c7f

Daniel Hiltgen authored Oct 17, 2025

* test: harden scheduler tests

This removes reschedDelay which was stale code, and adds
a new configurable timeout for the waitForVRAMRecovery so
tests can now set the timeout to be very short to avoid the
scheduler getting stuck and hitting a test timeout.

* test: tune tests for partial loads

Give stress tests more time when the model is split between CPU/GPU

68e04c7f

16 Oct, 2025 1 commit

renderers: add global flag for setting [img] tags (#12669) · 65fb3ff4

Jeffrey Morgan authored Oct 16, 2025

Adds a temporary global flag to renderers that causes renderers to always
render images as [img]. In a follow up change, we will consider making this
the default, and this flag could eventually be removed

65fb3ff4

14 Oct, 2025 1 commit
- add registries for parsers/renderers · ddaca643
  Devon Rifkin authored Oct 14, 2025
  
  ddaca643
13 Oct, 2025 1 commit

Qwen3VL Cloud Parser and Renderer (#12526) · 05982a95

Grace authored Oct 13, 2025



* working (other than tool call is the incorrect order) for tool calls and tools

* Tests work, other than image tags (tests do not go through server) and tools (not in the correct order, but contents are the same)

* testing for qwen3vl parser - toolparser is working

* made changes to JSON tool parser, wraps the TollCallFunction with a TollCall object

* Working parser for thinking models - assumes state of thinking, emits unambiguous content in thinking, does not call tool call in thinking

* changed the parser to start with collecting content

* thinking prefill

* add hasThinkingSupport parameter to parser

* qwen3-vl -> qwen3-vl-instruct for renderer/parser

* Add hasThinkingSupport=false to QwenVLParser

---------
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

05982a95

11 Oct, 2025 2 commits

Reapply "add truncate and shift parameters" (#12582) · 6544e147
Jeffrey Morgan authored Oct 11, 2025

6544e147

routes: fix built-in renderers for `api/generate` · 6db8da99

Devon Rifkin authored Oct 11, 2025

Made it so when api/generate builds up a message array and generates the
prompt it now goes through the same function as `api/chat` for
consistency. This is where we hook the optional built-in renderers to
bypass templates, which was missing for `api/generate` before this
change.

Closes: #12578

6db8da99

10 Oct, 2025 2 commits
- implement nvml for linux (#12517) · aab21904
  Daniel Hiltgen authored Oct 10, 2025
```
* implement nvml for linux

* Improve scheduler logging when VRAM doesn't recover
```
  aab21904
- thinking: allow `"think": false` for non-thinking models (#12555) · d681cd7c
  Patrick Devine authored Oct 09, 2025
  
  d681cd7c
09 Oct, 2025 1 commit

logs: quiet down context canceled on completion and scheduler noise (#12553) · 15e3611d

Daniel Hiltgen authored Oct 09, 2025

* logs: quiet down context canceled on completion

If the client closes the connection before Completion finishes, we were
logging at error level implying the runner crashed which was misleading.

time=2025-10-08T22:59:20.566-07:00 level=ERROR source=server.go:1490 msg="post predict" error="Post \"http://127.0.0.1:57736/completion\": context canceled"

* quiet down scheduler log error on expected case

Since we don't hold the lock while performing memory load calculations, other
runners can unload in parallel, so finding no runner to unload is a valid scenario
which we shouldn't log at error level.

15e3611d