Commits · 8852220f59c12cf3165f5643d38453ffecbb722d · OpenDAS / ollama

18 Dec, 2025 1 commit
- add REQUIRES command to Modelfile (#13361) · 8852220f
  Jeffrey Morgan authored Dec 18, 2025
  
  8852220f
16 Dec, 2025 1 commit

types: ConfigV2 and RootFS (#13504) · 45c47393

Bruce MacDonald authored Dec 16, 2025

Refactored the ConfigV2 and RootFS types from server/images.go to a new types/model/config.go file under the model package. Updated all references to use model.ConfigV2 and model.RootFS. This allows for use in other projects without worrying about compiling the c code in the llama package.

45c47393

27 Oct, 2025 1 commit

create: inherit FROM model's renderer/parser · 1bdd8169

Devon Rifkin authored Oct 27, 2025

On main, the `RENDERER` and `PARSER` fields from the `Modelfile` don't
get propagated to a new model created with a `req.From` parameter. This
is easily triggered via `ollama run qwen3-coder`, then running some save
command like `/save qwen3-coder-custom`.

Added a regression test for this, and then open the config for the
"from" model in order to use its renderer/parser as a default for the
new model. This will fix the CLI and also API-based creates.

Fixes: https://github.com/ollama/ollama/issues/12792

1bdd8169

17 Sep, 2025 1 commit
- engine: add remote proxy (#12307) · 8b894933
  Patrick Devine authored Sep 17, 2025
  
  8b894933
15 Sep, 2025 1 commit

add qwen3-coder tool support · 47991940

Devon Rifkin authored Sep 11, 2025

The format qwen3-coder uses is relatively unique, both in rendering and
in parsing. To implement parsing, I wrote a custom parser in similar
style to harmony. For the rendering, I found that the logic would be
much more difficult to follow in a template, so I introduced the concept
of a built-in renderer that uses go code, rather than a template to
generate prompts.

I set us up for future built-in parsers and renderers by making it so
they can be specified in a Modelfile like so:

```
RENDERER "qwen3-coder"
PARSER "qwen3-coder"
```

These need to be provided explicitly because the architecture alone is
not enough to understand what format the model expects to receive, and
what format we expect it to output (e.g., qwen3-coder is `qwen3moe`,
which includes other qwen3-family models as well)

I haven't converted harmony to be one of these "built-ins" yet, since
some of it is in flux with the changes @ParthSareen has been making to
move harmony to the runner. It is likely that many other built-ins will
need to move to the runner as well, but I'm able to slightly defer that
decision since qwen3-coder doesn't have thinking (and therefore doesn't
need to be in the runner to make structured outputs work). I expect to
unify harmony with this approach very soon.

Whether a particular model supports tools or thinking was previously
inferred from templates, but without a template we now also use the
parser itself to declare what it supports. If we have future models that
re-use the same parsing format, but have different capabilities, we'll
want to parameterize them and give them different names to be specified
as a `PARSER`.

Misc changes:

- I worked on the renderer by diffing outputs from the reference
  implementation and ours. To make it easier to do this, I extended
  <https://github.com/ollama/ollama/pull/11875> to also support
  returning the prompt via the openai compat layer

47991940

21 May, 2025 1 commit

remove support for multiple ggufs in a single file (#10722) · 61aeaf7e

Michael Yang authored May 21, 2025

* remove support for multiple ggufs in a single file

this was an attempt to make it easier to import multimodal models into
ollama. this was rarely used and error prone so remove it

* fix: create fused model from blob

61aeaf7e

19 May, 2025 2 commits

avoid kv truncation during create (#10761) · 1a0cfd08
Daniel Hiltgen authored May 19, 2025

1a0cfd08

ggml: Seperate tensor load from backend creation · 94ab428e

Jesse Gross authored Apr 17, 2025

Currently, when the backend is created, the tensors are loaded at the
same time, which is a slow operation. This separates them to be two
steps:
 - Create backend, including enumerating tensors and memory allocation
 - Loading tensor data

This allows more flexibility in managing model loading.

94ab428e

14 May, 2025 1 commit

fix crash in old clients with quantization progress (#10710) · ff80718e

Daniel Hiltgen authored May 14, 2025

Older clients assumed the digest was at least 19 characters long so increase the size
of the dummy digest to avoid array out of bounds crashes.

ff80718e

12 May, 2025 1 commit

convert: quantize from safetensors needs kv (#10675) · ad035ad5

Bruce MacDonald authored May 12, 2025

When creating a quantized model from safetensors we
need the array KV values to be loaded.Changing this
value to -1 loads the KV values on the returned
layer to be used and saved during quantization.

ad035ad5

06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

25 Apr, 2025 1 commit
- explicitly decode maxarraysize 1024 · 340448d2
  Michael Yang authored Apr 25, 2025
  
  340448d2
19 Apr, 2025 1 commit

create tempdir in models directory · 88738b35

Michael Yang authored Apr 18, 2025

the models directory should have plenty of storage and also ensure
there's no cross-device copy

88738b35

01 Mar, 2025 1 commit

server: validate local path on safetensor create (#9379) · bebb6823

Bruce MacDonald authored Feb 28, 2025

More validation during the safetensor creation process.
Properly handle relative paths (like ./model.safetensors) while rejecting absolute paths
Add comprehensive test coverage for various paths
No functionality changes for valid inputs - existing workflows remain unaffected
Leverages Go 1.24's new os.Root functionality for secure containment

bebb6823

14 Feb, 2025 1 commit

next ollama runner (#7913) · 58245413

Michael Yang authored Feb 14, 2025



feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

58245413

15 Jan, 2025 1 commit
- Fix absolute path names + gguf detection (#8428) · 2539f2db
  Patrick Devine authored Jan 14, 2025
  
  2539f2db
09 Jan, 2025 1 commit
- show a more descriptive error in the client if it is newer than the server (#8351) · 8bccae4f
  Patrick Devine authored Jan 09, 2025
  
  8bccae4f
01 Jan, 2025 1 commit
- Update the /api/create endpoint to use JSON (#7935) · 86a622cb
  Patrick Devine authored Dec 31, 2024
```
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
```
  86a622cb