Commits · 76912c062aaa61652f50532fd74273e76b4979f3 · OpenDAS / ollama

06 Jan, 2026 1 commit
- x: add experimental agent loop (#13628) · 76912c06
  Parth Sareen authored Jan 05, 2026
  
  76912c06
18 Dec, 2025 1 commit
- add REQUIRES command to Modelfile (#13361) · 8852220f
  Jeffrey Morgan authored Dec 18, 2025
  
  8852220f
10 Dec, 2025 2 commits
- cmd/bench: fix binary name in README (#13276) · dac4f17f
  Eloi Torrents authored Dec 10, 2025
  
  dac4f17f
- cmd/bench: fix options table in cmd/bench/README.md (#13216) · 56b8fb02
  Julia Scheaffer authored Dec 10, 2025
  
  56b8fb02
04 Dec, 2025 1 commit

cmd/bench: support writing benchmark output to file (#13263) · a03223b8

Eloi Torrents authored Dec 04, 2025



* cmd/bench: support writing benchmark output to file

This changes Ollama to allow the bench command to write benchmark
results to a user-specified output file instead of stdout when the
--output flag is provided.

---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>

a03223b8

02 Dec, 2025 1 commit

model: ministral w/ llama4 scaling (#13292) · d3e0a0de

Patrick Devine authored Dec 01, 2025



This change:

* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>

d3e0a0de

16 Nov, 2025 1 commit

tests: basic benchmarking test framework (#12964) · d7fd7219

Patrick Devine authored Nov 15, 2025

This change adds a basic benchmarking test framework for Ollama which can
be used to determine the prefill, eval, load duration, and total duration
for running a given model or models.

d7fd7219

05 Nov, 2025 2 commits

embeddings: added embedding command for cl (#12795) · 1ca608bc

nicole pardal authored Nov 05, 2025

Co-authored-by: A-Akhil <akhilrahul70@gmail.com>

This PR introduces a new ollama embed command that allows users to generate embeddings directly from the command line.

Added ollama embed MODEL [TEXT...] command for generating text embeddings
Supports both direct text arguments and stdin piping for scripted workflows

Outputs embeddings as JSON arrays (one per line)

1ca608bc

bugfix: show connection string for interactive cli usage (#12930) · f89fc1ca
Patrick Devine authored Nov 05, 2025

f89fc1ca

30 Oct, 2025 1 commit

fix(cmd): unload model before removal (#12832) · ed78e127

Michael Yang authored Oct 30, 2025

this change fixes two bugs with `ollama rm`:

1. before a model is removed, it will first be stopped. this only
   happens for the first argument and skipped for all other models
2. models are unloaded indiscriminately. this errors for cloud models
   and should be omitted

ed78e127

26 Sep, 2025 1 commit

bugfix: restore the current runOptions if loading fails in the CLI (#12402) · b04e46da

Patrick Devine authored Sep 25, 2025

There are two bugs when using `/load <model>` for a model that doesn't exist, namely:
1. it will not restore the current model settings if the current model is a thinking model; and
2. it will crash is the current model is a non-thinking model

This bug fix saves the current runOptions and then restores them if the model load
doesn't happen. It also fixes the crash happening for non-thinking models.

b04e46da

25 Sep, 2025 1 commit
- cli: add device signin flow when doing ollama push (#12405) · 5a56ff3c
  Patrick Devine authored Sep 25, 2025
  
  5a56ff3c
23 Sep, 2025 1 commit

auth: fix problems with the ollama keypairs (#12373) · 64883e3c

Patrick Devine authored Sep 22, 2025

* auth: fix problems with the ollama keypairs

This change adds several fixes including:
  - reading in the pubkey files correctly
  - fixing the push unit test to create a keypair file in a temp directory
  - not return 500 errors for normal status error

64883e3c

17 Sep, 2025 1 commit
- engine: add remote proxy (#12307) · 8b894933
  Patrick Devine authored Sep 17, 2025
  
  8b894933
11 Sep, 2025 1 commit
- cmd: use slices.Contains to simplify code (#12249) · 8a7e2055
  fengyuchuanshen authored Sep 12, 2025
  
  8a7e2055
15 Aug, 2025 1 commit
- cli: show the default context length env setting in online help (#11928) · 026bc292
  Patrick Devine authored Aug 15, 2025
  
  026bc292
05 Aug, 2025 1 commit

gpt-oss (#11672) · fa7776fd

Michael Yang authored Aug 05, 2025



* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

fa7776fd

24 Jul, 2025 1 commit
- cli: catch upstream errors gracefully (#11512) · 80b538e3
  Patrick Devine authored Jul 23, 2025
  
  80b538e3
22 Jul, 2025 1 commit
- Fix GetModelInfo (#11496) · 3bac5cba
  Patrick Devine authored Jul 22, 2025
```
---------
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
  3bac5cba
17 Jul, 2025 1 commit
- docs: add the no-Modelfile function of `ollama create` (#9077) · 802ad16c
  frob authored Jul 17, 2025
  
  802ad16c
16 Jul, 2025 1 commit
- cmd: add default assistant role to message construction (#11431) · d73f8aa8
  Parth Sareen authored Jul 16, 2025
  
  d73f8aa8
08 Jul, 2025 1 commit

API/CLI context enhancements (#11331) · 34088dbc

Daniel Hiltgen authored Jul 08, 2025

* API: expose context size of loaded models

* CLI: add context UX

This adds a column in the ps output to show the models context size.

34088dbc

09 Jun, 2025 1 commit

mac: handle "keep" named apps (#11031) · 82ad1dbc

Daniel Hiltgen authored Jun 09, 2025

When a user elects to keep the existing app, the
new Ollama is named `Ollama 2.app`
This fixes the app startup flow to handle this naming pattern.

82ad1dbc

08 Jun, 2025 1 commit
- spawn desktop quickly (#11011) · feeabdad
  Daniel Hiltgen authored Jun 08, 2025
```
Give the desktop app a hint to start fast.
```
  feeabdad
06 Jun, 2025 2 commits
- launch app hidden (#10962) · a8ed68bd
  Daniel Hiltgen authored Jun 06, 2025
```
When starting the app in the background, start it hidden.
```
  a8ed68bd
- win: handle more than 2048 processes (#10997) · 2ae65ae4
  Daniel Hiltgen authored Jun 06, 2025
```
Fix an array out of bounds crash
```
  2ae65ae4
29 May, 2025 1 commit

add thinking support to the api and cli (#10584) · 5f57b0ef

Devon Rifkin authored May 28, 2025

- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models

5f57b0ef

21 May, 2025 1 commit
- win: detect background upgrade in progress (#10785) · 7359b027
  Daniel Hiltgen authored May 21, 2025
```
Give the user a helpful error instead of showing
connection refused errors.
```
  7359b027
15 May, 2025 2 commits
- Fix lingering Q4_0 help reference (#10720) · 27da2cdd
  Daniel Hiltgen authored May 15, 2025
  
  27da2cdd
- cmd: add ellipses to truncated show metadata (#10717) · feb8923a
  Bruce MacDonald authored May 15, 2025
```
When a piece of information has been truncated in the show output an ellipses to indicate that more data has not been displayed
```
  feb8923a
13 May, 2025 1 commit
- server: add webp image input support (#10653) · c7f4ae7b
  Jeffrey Morgan authored May 12, 2025
  
  c7f4ae7b
10 May, 2025 1 commit
- cmd: strip single quotes from image page (#10636) · 3fa78598
  Bruce MacDonald authored May 09, 2025
  
  3fa78598
08 May, 2025 1 commit
- lint: enable usetesting, disable tenv (#10594) · 6e9a7a25
  Michael Yang authored May 08, 2025
  
  6e9a7a25
06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

05 May, 2025 1 commit
- create blobs in parallel (#10135) · d931ee8f
  Michael Yang authored May 05, 2025
```
* default max term height
* error on out of tree files
```
  d931ee8f
28 Apr, 2025 1 commit
- Revert "increase default context length to 4096 (#10364)" · dd93e1af
  Devon Rifkin authored Apr 28, 2025
```
This reverts commit 424f6486.
```
  dd93e1af
22 Apr, 2025 1 commit

increase default context length to 4096 (#10364) · 424f6486

Devon Rifkin authored Apr 22, 2025

* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`

424f6486

20 Apr, 2025 1 commit
- cmd: add support for escaping ~ in filepath (#10339) · 08065216
  greengrass821 authored Apr 21, 2025
```
Co-authored-by: tooth paste <tooth_paste91@Poorneshwars-MacBook-Pro.local>
```
  08065216
16 Apr, 2025 1 commit

cmd: add retry/backoff (#10069) · 1e7f62cb

Blake Mizerany authored Apr 15, 2025

This commit adds retry/backoff to the registry client for pull requests.

Also, revert progress indication to match original client's until we can
"get it right."

Also, make WithTrace wrap existing traces instead of clobbering them.
This allows clients to compose traces.

1e7f62cb

14 Apr, 2025 1 commit
- cmd: add missing file close in tests (#10179) · 64a9cc8f
  CYJiang authored Apr 14, 2025
  
  64a9cc8f