Commits · 76f88caf437c3f48b68ff6df649e283f7370b27b · OpenDAS / ollama

09 Dec, 2025 1 commit
- nomic-embed-text:v2: model implementation (#13162) · 76f88caf
  nicole pardal authored Dec 09, 2025
  
  76f88caf
04 Dec, 2025 1 commit

convert: add deepseek converter (#12980) · 0a844f8e

Patrick Devine authored Dec 04, 2025

This change adds the ability for `ollama create` to convert models that use
the DeepSeek2 architecture (specifically DeepSeekV3 and DeepSeek-R1).

0a844f8e

19 Nov, 2025 1 commit
- deepseekocr · 92981ae3
  Michael Yang authored Oct 31, 2025
  
  92981ae3
29 Oct, 2025 1 commit
- feat(model): add qwen3vl (#12665) · 7d25b9e1
  Michael Yang authored Oct 28, 2025
  
  7d25b9e1
05 Aug, 2025 1 commit

gpt-oss (#11672) · fa7776fd

Michael Yang authored Aug 05, 2025



* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>

fa7776fd

26 Jun, 2025 1 commit

add new gemma model (#11204) · 73b642e6

Michael Yang authored Jun 25, 2025

* update patches

* cherry pick metal mean kernel

* cherry pick cuda mean kernel

* gemma3n

73b642e6

16 May, 2025 1 commit

model: handle multiple eos tokens (#10577) · 333e3604

Michael Yang authored May 16, 2025

* get eos_token_id from generation_config.json

* refactor

* include both ids and strings in trace

* comments

* remove special case for gemma3 special vocab (#10743)

333e3604

14 May, 2025 2 commits
- model: add Qwen2.5-VL support (#10385) · 0aa8b371
  Bruce MacDonald authored May 13, 2025
  
  0aa8b371
- chore: update mllama to use ollama engine (#10637) · 23125648
  Michael Yang authored May 13, 2025
  
  23125648
06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

25 Apr, 2025 2 commits
- llama4 · f0c66e6d
  Michael Yang authored Apr 03, 2025
  
  f0c66e6d
- convert: change to colmajor · 4892872c
  Michael Yang authored Apr 25, 2025
  
  4892872c
03 Apr, 2025 1 commit

model: support for mistral-small in the ollama runner · 6bd0a983

Bruce MacDonald authored Mar 14, 2025

Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.

6bd0a983

18 Mar, 2025 1 commit
- convert: return name of unsupported architecture (#9862) · 61a88252
  Bruce MacDonald authored Mar 18, 2025
```
When a model's architecture cannot be converted return the name of the unsupported arch in the error message.
```
  61a88252
11 Mar, 2025 4 commits
- all: address linter errors · 83f0ec82
  jmorganca authored Mar 11, 2025
  
  83f0ec82
- fix conversion · c62861f4
  Patrick Devine authored Mar 07, 2025
  
  c62861f4
- add gemma vision encoder · 4b037a97
  Michael Yang authored Mar 06, 2025
  
  4b037a97
- gemma2 impl · 5f74d1fd
  Patrick Devine authored Feb 07, 2025
  
  5f74d1fd
14 Feb, 2025 1 commit

next ollama runner (#7913) · 58245413

Michael Yang authored Feb 14, 2025



feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

58245413

16 Jan, 2025 1 commit
- convert: import support for command-r models from safetensors (#6063) · 93a8daf2
  Josh authored Jan 15, 2025
```
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
```
  93a8daf2
14 Jan, 2025 1 commit

convert: qwen2 from safetensors (#8408) · f6f37130

Bruce MacDonald authored Jan 14, 2025

Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.

f6f37130

10 Sep, 2024 1 commit
- catch when model vocab size is set correctly (#6714) · 84b84ce2
  Patrick Devine authored Sep 09, 2024
  
  84b84ce2
23 Aug, 2024 1 commit
- convert safetensor adapters into GGUF (#6327) · 0c819e16
  Patrick Devine authored Aug 23, 2024
  
  0c819e16
21 Aug, 2024 2 commits
- convert gemma2 · 3546bbd0
  Michael Yang authored Jun 28, 2024
  
  3546bbd0
- bert · 5a28b9cf
  Michael Yang authored Jun 06, 2024
  
  5a28b9cf
12 Aug, 2024 1 commit
- add conversion for microsoft phi 3 mini/medium 4k, 128 · 6ffb5cb0
  Michael Yang authored Jun 03, 2024
  
  6ffb5cb0
31 Jul, 2024 3 commits
- convert: only extract large files · eafc607a
  Michael Yang authored Jun 29, 2024
  
  eafc607a
- comments · df993fa3
  Michael Yang authored Jul 08, 2024
  
  df993fa3
- refactor convert · 5e9db9fb
  Michael Yang authored May 31, 2024
  
  5e9db9fb
04 Jun, 2024 1 commit
- lint · e40145a3
  Michael Yang authored May 21, 2024
  
  e40145a3
20 May, 2024 5 commits
- cleanup · bbbd9f20
  Michael Yang authored May 15, 2024
  
  bbbd9f20
- bpe pretokenizer · 547132e8
  Michael Yang authored May 15, 2024
  
  547132e8
- add fixes for llama · d355d202
  Patrick Devine authored May 08, 2024
  
  d355d202
- llama3 conversion · c8cf0d94
  Patrick Devine authored Apr 28, 2024
  
  c8cf0d94
- some changes for llama3 · d88582df
  Patrick Devine authored Apr 18, 2024
  
  d88582df
06 May, 2024 1 commit

quantize any fp16/fp32 model · 9685c345

Michael Yang authored Apr 12, 2024

- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}

9685c345

24 Apr, 2024 1 commit
- add mixtral 8x7b model conversion (#3859) · ce8ce825
  Patrick Devine authored Apr 23, 2024
  
  ce8ce825
15 Apr, 2024 1 commit
- Add llama2 / torch models for `ollama create` (#3607) · 9f8691c6
  Patrick Devine authored Apr 15, 2024
  
  9f8691c6
06 Apr, 2024 1 commit
- no rope parameters · be517e49
  Michael Yang authored Apr 05, 2024
  
  be517e49
01 Apr, 2024 1 commit
- Simplify model conversion (#3422) · 3b6a9154
  Patrick Devine authored Apr 01, 2024
  
  3b6a9154