Commits · fa7776fd2458fc3a8aeb7f12e4bc65b439955319 · OpenDAS / ollama

05 Aug, 2025 1 commit

Michael Yang authored Aug 05, 2025



* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

fa7776fd

24 Jul, 2025 1 commit
- cli: catch upstream errors gracefully (#11512) · 80b538e3
  Patrick Devine authored Jul 23, 2025
  
  80b538e3
22 Jul, 2025 1 commit
- Fix GetModelInfo (#11496) · 3bac5cba
  Patrick Devine authored Jul 22, 2025
```
---------
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
  3bac5cba
17 Jul, 2025 1 commit
- docs: add the no-Modelfile function of `ollama create` (#9077) · 802ad16c
  frob authored Jul 17, 2025
  
  802ad16c
16 Jul, 2025 1 commit
- cmd: add default assistant role to message construction (#11431) · d73f8aa8
  Parth Sareen authored Jul 16, 2025
  
  d73f8aa8
08 Jul, 2025 1 commit

API/CLI context enhancements (#11331) · 34088dbc

Daniel Hiltgen authored Jul 08, 2025

* API: expose context size of loaded models

* CLI: add context UX

This adds a column in the ps output to show the models context size.

34088dbc

09 Jun, 2025 1 commit

mac: handle "keep" named apps (#11031) · 82ad1dbc

Daniel Hiltgen authored Jun 09, 2025

When a user elects to keep the existing app, the
new Ollama is named `Ollama 2.app`
This fixes the app startup flow to handle this naming pattern.

82ad1dbc

08 Jun, 2025 1 commit
- spawn desktop quickly (#11011) · feeabdad
  Daniel Hiltgen authored Jun 08, 2025
```
Give the desktop app a hint to start fast.
```
  feeabdad
06 Jun, 2025 2 commits
- launch app hidden (#10962) · a8ed68bd
  Daniel Hiltgen authored Jun 06, 2025
```
When starting the app in the background, start it hidden.
```
  a8ed68bd
- win: handle more than 2048 processes (#10997) · 2ae65ae4
  Daniel Hiltgen authored Jun 06, 2025
```
Fix an array out of bounds crash
```
  2ae65ae4
29 May, 2025 1 commit

add thinking support to the api and cli (#10584) · 5f57b0ef

Devon Rifkin authored May 28, 2025

- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/se...

5f57b0ef

21 May, 2025 1 commit
- win: detect background upgrade in progress (#10785) · 7359b027
  Daniel Hiltgen authored May 21, 2025
```
Give the user a helpful error instead of showing
connection refused errors.
```
  7359b027
15 May, 2025 2 commits
- Fix lingering Q4_0 help reference (#10720) · 27da2cdd
  Daniel Hiltgen authored May 15, 2025
  
  27da2cdd
- cmd: add ellipses to truncated show metadata (#10717) · feb8923a
  Bruce MacDonald authored May 15, 2025
```
When a piece of information has been truncated in the show output an ellipses to indicate that more data has not been displayed
```
  feb8923a
13 May, 2025 1 commit
- server: add webp image input support (#10653) · c7f4ae7b
  Jeffrey Morgan authored May 12, 2025
  
  c7f4ae7b
10 May, 2025 1 commit
- cmd: strip single quotes from image page (#10636) · 3fa78598
  Bruce MacDonald authored May 09, 2025
  
  3fa78598
08 May, 2025 1 commit
- lint: enable usetesting, disable tenv (#10594) · 6e9a7a25
  Michael Yang authored May 08, 2025
  
  6e9a7a25
06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

05 May, 2025 1 commit
- create blobs in parallel (#10135) · d931ee8f
  Michael Yang authored May 05, 2025
```
* default max term height
* error on out of tree files
```
  d931ee8f
28 Apr, 2025 1 commit
- Revert "increase default context length to 4096 (#10364)" · dd93e1af
  Devon Rifkin authored Apr 28, 2025
```
This reverts commit 424f6486.
```
  dd93e1af
22 Apr, 2025 1 commit

increase default context length to 4096 (#10364) · 424f6486

Devon Rifkin authored Apr 22, 2025

* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`

424f6486

20 Apr, 2025 1 commit
- cmd: add support for escaping ~ in filepath (#10339) · 08065216
  greengrass821 authored Apr 21, 2025
```
Co-authored-by: tooth paste <tooth_paste91@Poorneshwars-MacBook-Pro.local>
```
  08065216
16 Apr, 2025 1 commit

cmd: add retry/backoff (#10069) · 1e7f62cb

Blake Mizerany authored Apr 15, 2025

This commit adds retry/backoff to the registry client for pull requests.

Also, revert progress indication to match original client's until we can
"get it right."

Also, make WithTrace wrap existing traces instead of clobbering them.
This allows clients to compose traces.

1e7f62cb

14 Apr, 2025 1 commit
- cmd: add missing file close in tests (#10179) · 64a9cc8f
  CYJiang authored Apr 14, 2025
  
  64a9cc8f
08 Apr, 2025 1 commit

cleanup: remove OLLAMA_TMPDIR and references to temporary executables (#10182) · ccc8c677

frob authored Apr 09, 2025



* cleanup: remove OLLAMA_TMPDIR
* cleanup: ollama doesn't use temporary executables anymore

---------
Co-authored-by: Richard Lyons <frob@cloudstaff.com>

ccc8c677

02 Apr, 2025 1 commit

chore(all): replace instances of interface with any (#10067) · 9876c9fa

Bruce MacDonald authored Apr 02, 2025

Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.

9876c9fa

01 Apr, 2025 1 commit

api: return model capabilities from the show endpoint (#10066) · e172f095

Bruce MacDonald authored Apr 01, 2025

With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.

e172f095

21 Mar, 2025 1 commit
- fix: show correct bool value for kv in verbose show information (#9928) · 6d110304
  Patrick Devine authored Mar 21, 2025
  
  6d110304
15 Mar, 2025 1 commit

fix: correctly save in interactive mode (#9788) · 2c8b4846

Patrick Devine authored Mar 15, 2025

This fixes the case where a FROM line in previous modelfile points to a
file which may/may not be present in a different ollama instance. We
shouldn't be relying on the filename though and instead just check if
the FROM line was instead a valid model name and point to that instead.

2c8b4846

13 Mar, 2025 1 commit

add verbose mode to the show command (#9640) · 4bed7392

Patrick Devine authored Mar 13, 2025

Add metadata and tensor information to the show command to be able to
see more information about a model. This outputs the same data as
shown on the model details page on ollama.com

4bed7392

12 Mar, 2025 1 commit
- cli: don't exit for invalid model during /load. (#9576) · b3af953a
  frob authored Mar 12, 2025
```
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
  b3af953a
04 Mar, 2025 2 commits

ml/backend/ggml: consolidate system info logging · 05a01fde

Michael Yang authored Feb 28, 2025

- output backend system info when initializing the backend. this ensures
  this information is always present without needing to be called
  explicitly
- convert to structured logging
- enumerate devices rather than backends since devices are ordered
- track device indices grouped by device name

05a01fde

New engine: vision models and auto-fallback (#9113) · 1fdb351c

Daniel Hiltgen authored Mar 04, 2025

* Include unified vision layers in memory prediction

For newer vision models with a single gguf, include
the projection estimates.

* Adjust CLI to handle both styles of vision model metadata

* Wire up new tokenizers for new engine

If we're loading the new engine, utilize the new model
text processor instead of calling into cgo wrappers for
llama.cpp.  This also cleans up some tech debt from the
older tokenization flow for the C++ server which was
no longer used.

This also adjusts the grammar handling logic to pass
through to the new engine instead of utilizing the cgo
schema to grammar call.

* Lay foundation for auto selection of new engine

1fdb351c

03 Mar, 2025 1 commit
- cmd: add default err return for stop (#9458) · d25efe39
  CYJiang authored Mar 04, 2025
  
  d25efe39
19 Feb, 2025 1 commit
- test: add test cases for ListHandler (#9146) · d721a02e
  yuiseki authored Feb 20, 2025
  
  d721a02e
14 Feb, 2025 1 commit

Runner for Ollama engine · ed443a03

Jesse Gross authored Dec 17, 2024

This provides integration with the new Ollama engine
(58245413 next ollama runner (#7913)) and the rest of the Ollama
infrastructure such as the runner and Ollama server.

In addition, it also builds out the KV cache infrastructure to
support requirements of how Ollama runs models such as:
 - Parallel processing
 - Memory management for defragmentation and shifting
 - Multi-modal modals

Both old and new engines continue to be supported. By default, only
the old engine is used. To enable the new engine:

Start the server with the OLLAMA_NEW_ENGINE environment variable set:
OLLAMA_NEW_ENGINE=1 ./ollama serve

Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
./ollama run jessegross/llama3.1

ed443a03

16 Jan, 2025 1 commit
- fix default modelfile for create (#8452) · a420a453
  Patrick Devine authored Jan 16, 2025
  
  a420a453
11 Jan, 2025 1 commit
- make the modelfile path relative for `ollama create` (#8380) · 32bd37ad
  Patrick Devine authored Jan 10, 2025
  
  32bd37ad
09 Jan, 2025 1 commit
- show a more descriptive error in the client if it is newer than the server (#8351) · 8bccae4f
  Patrick Devine authored Jan 09, 2025
  
  8bccae4f
01 Jan, 2025 1 commit
- Update the /api/create endpoint to use JSON (#7935) · 86a622cb
  Patrick Devine authored Dec 31, 2024
```
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
```
  86a622cb