Commits · 2bccf8c6249eed1e85758d41eeb614d951cafec4 · OpenDAS / ollama

09 Dec, 2025 4 commits
- renderers/parsers: olmo3 instruct (#13383) · 2bccf8c6
  Parth Sareen authored Dec 09, 2025
  
  2bccf8c6
- parsers/renderers: olmo3 think (#13290) · 0c5e5f66
  Parth Sareen authored Dec 09, 2025
  
  0c5e5f66
- fix: qwen2.5vl metal argsort · d475d1f0
  Michael Yang authored Dec 08, 2025
  
  d475d1f0
- model: add rnj-1 inference support (#13354) · d2f334c1
  Jeffrey Morgan authored Dec 08, 2025
  
  d2f334c1
08 Dec, 2025 5 commits
- refactor rope · 603ceefa
  Michael Yang authored Nov 18, 2025
```
change to a flatter directory structure and group the options with the
function

update models to call rope in one place
```
  603ceefa
- truncation: fixed runner truncation logic + removed server truncation (#12839) · e082d60a
  nicole pardal authored Dec 08, 2025
```
This PR consolidates all embedding prompt-length checking, truncation, and prompt token counting into the runner to ensure a single source of truth.
```
  e082d60a
- CI: use vendor base commit in cache keys (#13348) · 5dae7380
  Daniel Hiltgen authored Dec 08, 2025
```
Prevent CGO from accidentally reusing old object files from the cache
across vendor updates
```
  5dae7380
- readme: fix broken Swollama link in community integrations (#13370) · 0c787231
  JJ authored Dec 07, 2025
  
  0c787231
- fs/ggml: write int32 and int64 values to gguf files (#13335) · 5a41d69b
  Jeffrey Morgan authored Dec 07, 2025
  
  5a41d69b
06 Dec, 2025 1 commit

ggml: handle all streams (#13350) · c146a138

Daniel Hiltgen authored Dec 05, 2025

Follow up from #12992

Free all streams, and keep the alloc logic aligned across streams.

c146a138

05 Dec, 2025 1 commit

fix(api): correct Content-Type header for /api/chat and /api/generate when... · 31b8c6a2

Sos Pogosyan authored Dec 05, 2025


fix(api): correct Content-Type header for /api/chat and /api/generate when using cloud models (#13279)

---------
Co-authored-by: Pogosyan Sos <sos_pogosyan@MacBook-Pro-Sos.local>
Co-authored-by: Patrick Devine <patrick@infrahq.com>

31b8c6a2

04 Dec, 2025 7 commits

llm: Enable flash attention for mistral3 by default · 9191dfaf
Jesse Gross authored Dec 04, 2025

9191dfaf

ggml: Enable flash attention for vision encoders · 1108d8b3

Jesse Gross authored Dec 02, 2025

Although the vision component of multimodal models typically already
call the optimized nn.Attention, it is converted into non-fused
operations. That is because the backend-specific fused kernels may
have requirements, such as padding, and they is performed by the
cache, which vision encoders don't use.

This implements a fallback path in the backend, softening the
requirements into optimizations. In turn, this allows flash attention
to be used for vision encoders, saving a significant amount of VRAM
and improving performance.

1108d8b3

ggml: Always set cache padding to 256 · 7837a5bc

Jesse Gross authored Dec 04, 2025

We currently use cache padding of 32 when not using flash attention
and 256 with flash attention, which is based on the historic alignment
requirements of these kernels. The restrictions have since been
loosened but there are still performance benefits, such as better
CUDA graph reuse.

Since the requirement is no longer kernel-specific, set the padding
uniformly to 256, as llama.cpp has.

7837a5bc

convert: add deepseek converter (#12980) · 0a844f8e

Patrick Devine authored Dec 04, 2025

This change adds the ability for `ollama create` to convert models that use
the DeepSeek2 architecture (specifically DeepSeekV3 and DeepSeek-R1).

0a844f8e

cmd/bench: support writing benchmark output to file (#13263) · a03223b8

Eloi Torrents authored Dec 04, 2025



* cmd/bench: support writing benchmark output to file

This changes Ollama to allow the bench command to write benchmark
results to a user-specified output file instead of stdout when the
--output flag is provided.

---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>

a03223b8

ggml update to b7108 (#12992) · 0cf7794b

Daniel Hiltgen authored Dec 03, 2025

* Revert "vulkan: temporary cary of vulkan fixes (#12971)"

This reverts commit 3a9e8e9f.

* ggml update to b7087

* fix argsort on metal

* update to b7108

* fix bakllava regression

This model lacks the metadata for the projector type.

* update to b7209

* fix TopK perf

* only build arm code on arm

0cf7794b

ci: restore previous linter rules (#13322) · 854d40ed
Jeffrey Morgan authored Dec 03, 2025

854d40ed

03 Dec, 2025 2 commits

app: relay thinking false to server (#13319) · 84a2cedf

Bruce MacDonald authored Dec 03, 2025

This fixes a bug where disabling thinking on deepseek-v3.1 did not stop the model from thinking.

When thinking is not defined it should not be sent to the server since this will cause error responses in some cases where the model does not support thinking. However if it is defined as false it should still be sent.

84a2cedf

CUDA: filter devices on secondary discovery (#13317) · 3f308367

Daniel Hiltgen authored Dec 03, 2025

We now do a deeper probe of CUDA devices to verify the library version has
the correct compute capability coverage for the device. Due to ROCm also
interpreting the CUDA env var to filter AMD devices, we try to avoid setting
it which leads to problems in mixed vendor systems. However without setting
it for this deeper probe, each CUDA library subprocess discovers all CUDA GPUs
and on systems with lots of GPUs, this can lead to hitting timeouts. The fix is
to turn on the CUDA visibility env var just for this deeper probe use-case.

3f308367

02 Dec, 2025 7 commits
- Update user message format for temperature query (#13256) · cc9555af
  Nathan Hook authored Dec 02, 2025
  
  cc9555af
- Add Vulkan GPU support instructions in development.md (#13265) · 20aee967
  hello_world authored Dec 03, 2025
```
Added Vulkan SDK installation instructions and environment variable setup for building with Vulkan support.
```
  20aee967
- test: avoid ministral tools test on low vram (#13302) · 18b5958d
  Daniel Hiltgen authored Dec 02, 2025
```
Avoid hitting test timeouts
```
  18b5958d
- llm: Don't always evict models on CPU-only systems · 5317202c
  Jesse Gross authored Nov 25, 2025
```
Model eviction happens when we have at least one other model
loaded and are unable to load all layers into VRAM. However, on
CPU-only systems we can never load layers into VRAM, so this
constantly triggered eviction.

Fixes #13227
```
  5317202c
- test: add ministral-3 (#13300) · d771043e
  Daniel Hiltgen authored Dec 02, 2025
  
  d771043e
- CUDA: verify CC is supported by target library (#13298) · f8f10718
  Daniel Hiltgen authored Dec 02, 2025
  
  f8f10718
- model: ministral w/ llama4 scaling (#13292) · d3e0a0de
  Patrick Devine authored Dec 01, 2025
```
This change:

* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
```
  d3e0a0de
01 Dec, 2025 3 commits

win: warn if ggml-base detected in PATH (#13289) · 55417275

Daniel Hiltgen authored Dec 01, 2025

If the user has somehow installed another GGML based app which places a
ggml-base lib somewhere in their PATH, we can experience runtime problems
due to incompatibilities. This change adds a warning message if we detect
a ggml-base outside of our install location to aid in troubleshooting.

55417275

api/client: handle non-json streaming errors (#13007) · 5b6a8e60

Bruce MacDonald authored Dec 01, 2025

While processing the response stream during a chat or generation if an error is occurred it is parsed and returned to the user. The issue with the existing code is that this assumed the response would be valid JSON, which is not a safe assumption and caused cryptic error messages to be displayed due to parsing failures:
`invalid character 'i' looking for beginning of value`

This change updates the stream function to return the raw error string if it cant be parsed as JSON. This should help with debugging issues by making sure the actual error reaches the user.

5b6a8e60

jetpack: require exact match or skip cuda_jetpack* (#13288) · 467bbc0d

Daniel Hiltgen authored Dec 01, 2025

The cuda_jetpack libs will enumerate discrete GPUs on SBSA systems
which leads to runtime failures of missing kernels.  This fix
requires an exact match to enable jetpacks instead of relying on
enumeration to filter out supported libraries.

467bbc0d

30 Nov, 2025 1 commit
- .gitattributes: add app/webview to linguist-vendored (#13274) · 6d9f9323
  Jeffrey Morgan authored Nov 29, 2025
  
  6d9f9323
29 Nov, 2025 1 commit
- docs: fix output formatting in faq.mdx (#13231) · 0c248960
  Ondrej Kokes authored Nov 29, 2025
```
There were a few Markdown typos in one FAQ answer. It now renders as a proper ascii table.
```
  0c248960
26 Nov, 2025 1 commit
- docs: remove deprecated parameters (#13237) · 8b1b89a9
  EntropyYue authored Nov 26, 2025
  
  8b1b89a9
20 Nov, 2025 6 commits
- app/cmd: update ollama help to navigate to ollama doc instead of github page (#13174) · 47e272c3
  Eva H authored Nov 20, 2025
  
  47e272c3
- app: open app instead of always navigating to / on connect (#13164) · 417a81fd
  Jeffrey Morgan authored Nov 20, 2025
  
  417a81fd
- discovery: fix cuda overlap case (#13176) · dba62ff3
  Daniel Hiltgen authored Nov 20, 2025
```
Recent refactoring introduced a regression for filtering cuda overlap to favor newest supported version.
```
  dba62ff3
- Parser for Cogito v2 (#13145) · d70e9355
  Grace authored Nov 19, 2025
  
  d70e9355
- deepseek2: upgrade to run v3+ models (#13166) · 5c1063df
  Michael Yang authored Nov 19, 2025
```
the check for mla omits v3 and r1 which should not return unsupported.
instead check the tokenizer for compatibility
```
  5c1063df
- kvcache: Run tests both with and without PermutedV · cb485b20
  Jesse Gross authored Nov 19, 2025
```
The causal cache can store data differently depending on what is
best for the backend. We should run tests both ways.
```
  cb485b20
19 Nov, 2025 1 commit
- nomic-embed: nomic-embed-text defaulted to ollama runner (#13144) · b2af5096
  nicole pardal authored Nov 19, 2025
  
  b2af5096