Commits · 1b308e1d2a478e70ef3e31e6b24d687a44b33016 · OpenDAS / ollama

13 Dec, 2025 1 commit
- model: fix global layer rope scale values for gemma 3 (#13452) · 1b308e1d
  Jeffrey Morgan authored Dec 12, 2025
  
  1b308e1d
12 Dec, 2025 2 commits
- model: force rope factor 1.0 for Gemma 3 (#13445) · 3af5d3b7
  Jeffrey Morgan authored Dec 12, 2025
  
  3af5d3b7
- model: fix rotary embeddings for ministral 3 (#13432) · 2dfb7441
  Jeffrey Morgan authored Dec 11, 2025
  
  2dfb7441
11 Dec, 2025 1 commit
- model: conversion and hyperparameter fixes for ministral and devstral (#13424) · a838421e
  Jeffrey Morgan authored Dec 11, 2025
  
  a838421e
09 Dec, 2025 2 commits
- nomic-embed-text:v2: model implementation (#13162) · 76f88caf
  nicole pardal authored Dec 09, 2025
  
  76f88caf
- model: add rnj-1 inference support (#13354) · d2f334c1
  Jeffrey Morgan authored Dec 08, 2025
  
  d2f334c1
08 Dec, 2025 1 commit

Michael Yang authored Nov 18, 2025

change to a flatter directory structure and group the options with the
function

update models to call rope in one place

603ceefa

02 Dec, 2025 1 commit

model: ministral w/ llama4 scaling (#13292) · d3e0a0de

Patrick Devine authored Dec 01, 2025



This change:

* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>

d3e0a0de

20 Nov, 2025 1 commit

deepseek2: upgrade to run v3+ models (#13166) · 5c1063df

Michael Yang authored Nov 19, 2025

the check for mla omits v3 and r1 which should not return unsupported.
instead check the tokenizer for compatibility

5c1063df

19 Nov, 2025 3 commits
- models: enable deepseek2 (deepseek v3.1 w/ MLA) on the new engine (#13151) · 604e43b2
  Patrick Devine authored Nov 18, 2025
  
  604e43b2
- nomic-embed-text model implementation (#13071) · 8de30b56
  nicole pardal authored Nov 18, 2025
  
  8de30b56
- deepseekocr · 92981ae3
  Michael Yang authored Oct 31, 2025
  
  92981ae3
18 Nov, 2025 1 commit
- Add deepseek v3.1 (#13063) · 584e2d64
  Grace authored Nov 17, 2025
```
* Add mla for flash attention
* Revert to using chunks
```
  584e2d64
13 Nov, 2025 1 commit

chore: update models to use slice/chunk/chunksections (#12934) · 333203d8

Michael Yang authored Nov 13, 2025

* use slice/chunks

* bert

* llama4

* gemma3n

* gptoss

* mistral3

* qwen3vl

* qwen25vl

* deepseek2

* remove unused ops

333203d8

06 Nov, 2025 1 commit
- ggml update to b6840 (#12791) · 544b6739
  Daniel Hiltgen authored Nov 06, 2025
  
  544b6739
03 Nov, 2025 1 commit
- chore(gptoss): cleanup dead code (#12932) · ce3eb0a3
  Michael Yang authored Nov 03, 2025
  
  ce3eb0a3
30 Oct, 2025 2 commits
- interleaved mrope (#12807) · f67a6df1
  Michael Yang authored Oct 30, 2025
```
* ml(ggml): mrope
* interleave mrope
```
  f67a6df1
- fix: qwen2.5vl, qwen3vl composite image (#12841) · d432ade7
  Michael Yang authored Oct 30, 2025
```
this change fixes images with an alpha channel by overlaying the image
onto a white background
```
  d432ade7
29 Oct, 2025 1 commit
- feat(model): add qwen3vl (#12665) · 7d25b9e1
  Michael Yang authored Oct 28, 2025
  
  7d25b9e1
28 Oct, 2025 2 commits
- s/From*Slice/From*s/ (#12255) · 1188f408
  Michael Yang authored Oct 28, 2025
  
  1188f408
- gemma3: make embedding non-causal (#12297) · ec9eb28f
  Michael Yang authored Oct 27, 2025
  
  ec9eb28f
18 Oct, 2025 1 commit
- contiguous input per layer (#12686) · bc1a818f
  Daniel Hiltgen authored Oct 17, 2025
```
Co-authored-by: Michael Yang <git@mxy.ng>
```
  bc1a818f
13 Oct, 2025 1 commit
- fix(qwen3): deepseek distill · 6c833d5f
  Michael Yang authored Oct 13, 2025
```
deepseek's qwen3 distill uses a different rope scheme so support both
```
  6c833d5f
09 Oct, 2025 2 commits
- refactor: use builtin max and min · 47298fce
  shengxinjing authored Sep 28, 2025
  
  47298fce
- refactor: use builtin max and min · 4a48937e
  shengxinjing authored Sep 25, 2025
  
  4a48937e
03 Oct, 2025 1 commit
- Fixed Deepseek2 adding nil tensor error · 33801c15
  Grace authored Oct 03, 2025
  
  33801c15
24 Sep, 2025 1 commit

Grace/deepseek v3 migration (#12385) · fbd82ba5

Grace authored Sep 24, 2025



* init deepseek model file

* temp removal of flash attention implementation

* shapes and proper, can make a pass

* query, key, value have good cosine similarity, but the max diff is a bit high

* Attention block is working! ** with eager for now, have not added the mask line

* Attention block is working! ** with eager for now, have not added the mask line

* working MoE at around 0.95 cosine sim

* added cosine similarity function

* Starting end to end structure

* Trying (and failing) to get rope to work, going to test full thing on tater

* running on tater36... just not the right outputs

* we have the right values for rope... but its still not working?

* chnage Extrapolation Factor to 1

* removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer

* Temporary modelfiles for cpu

* change kpass intermediate step to kv, two layer outputs [0,1] look fine

* this calls for 16 chicken nuggets

* whoops

* cleaning up code

* delete stuff we dont need

* getting rid of debug statements for llama cpp

* working with long contexts

* fix long context view error

* reverting some changes I made for files that are not apart of pr

* Added proper tokenizer for deeepseek3

* clean up model and go test

* remove Modelfile

* not passing the tests

* whoops

* how to pass the ci tests

* resolving some of the comments

* rename

* linted and renamed deepseek3 -> deepseek2

* remove name go

* addressed changes - main change was adopting qwen3 naming scheme

* I cannot with linters

* clean up logs

* clean up logs

---------
Co-authored-by: Grace Guo <graceguo@Graces-MBP.localdomain>
Co-authored-by: Grace Guo <graceguo@Graces-MacBook-Pro.local>
Co-authored-by: graceguo <graceguo@tater36.localdomain>

fbd82ba5

23 Sep, 2025 2 commits
- add pre:, suf: to tags (#12274) · bf78ed6e
  Michael Yang authored Sep 23, 2025
  
  bf78ed6e
- multi-regexp pretokenizer (#12325) · a40d427b
  Michael Yang authored Sep 23, 2025
  
  a40d427b
19 Sep, 2025 1 commit
- gemma: fix rope scaling for qat models (#12348) · dba39b2e
  Patrick Devine authored Sep 19, 2025
```
* gemma: fix rope scaling for qat models

* gofumpt yourself
```
  dba39b2e
18 Sep, 2025 1 commit
- feat: qwen3 embed (#12301) · 7460259e
  Michael Yang authored Sep 18, 2025
```
* cleanup

* use pooling.TypeNone

* pooling test

* qwen3 embed
```
  7460259e
17 Sep, 2025 1 commit
- fix(llama): other llama flavours (#12308) · 564b558c
  Michael Yang authored Sep 17, 2025
```
* fix(llama): rope scale

* spm llama

* skip moe models

* cleanup
```
  564b558c
16 Sep, 2025 2 commits
- use split activations when possible (#12293) · ad95d5b3
  Michael Yang authored Sep 16, 2025
```
* use ggml_*_split activations when possible

* forward qkv
```
  ad95d5b3
- embed: cleanup (#12299) · c253433d
  Michael Yang authored Sep 16, 2025
```
* cleanup

* use pooling.TypeNone

* pooling test
```
  c253433d
15 Sep, 2025 2 commits

model: implement bert in ollama engine (#9080) · 3f6642f6

Michael Yang authored Sep 15, 2025

* fix truncate

* s/SentencePieceModel/SentencePiece/

* bert

* wordpiece

* refactor pooling

* more tokenizers

* normalize embeddings

3f6642f6

batch: use tensors for outputs (#12185) · 6f711714
Michael Yang authored Sep 15, 2025
```
this cleans up the model interface slightly without too much impact in
other areas
```
6f711714

04 Sep, 2025 1 commit
- embedding gemma model (#12181) · 5994e8e8
  Michael Yang authored Sep 04, 2025
```
* ollama: add embeddings
```
  5994e8e8
29 Aug, 2025 1 commit

perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd

Daniel Hiltgen authored Aug 29, 2025

* perf: build graph for next batch in parallel to keep GPU busy

This refactors the main run loop of the ollama runner to perform the main GPU
intensive tasks (Compute+Floats) in a go routine so we can prepare the next
batch in parallel to reduce the amount of time the GPU stalls waiting for the
next batch of work.

* tests: tune integration tests for ollama engine

This tunes the integration tests to focus more on models supported
by the new engine.

517807cd

25 Aug, 2025 1 commit
- remove extra field attr (#11205) · 30fb7e19
  Michael Yang authored Aug 25, 2025
  
  30fb7e19
14 Aug, 2025 1 commit

update vendored llama.cpp and ggml (#11823) · 1a19df1f

Michael Yang authored Aug 14, 2025

* TEMPORARY: Update the llama.cpp upstream to my fork's Granite Four branch

This will be redone once my branch is merged upstream in llama.cpp

* feat: Update all patches

There are a number that are no longer needed at all:

- 0003-embeddings: Embeddings entirely overhauled on master
- 0008-ensure-KV-cache-is-fully-defragmented: KV caching entirely
    overhauled on master
- 0019-metal-add-mean-kernel-14267: Merged upstream
- 0020-CUDA-add-mean-operation-14313: Merged upstream

* feat: Sync llama.cpp and ggml

* fix: Update rsync-filter for all moved/new/removed files

* fix: Add files missing from sync

* fix: Update ggml rsync-filter for new ggml-cpu/arch subdirs

* fix: Add ggml files missing from sync

* fix: Narrow llama.cpp rsync-filter to not include mtmd main tool cpp files

* fix: Remove mtmd main cpp files

* fix: Add missing include in sampling_ext.cpp

* fix: Update llama.go to use mtmd instead of clip/llava

* fix: Add patch for mtmd_input_text

* chore: Ignore *.patched in the patch directory

* fix: Fix support for arch-specific ggml-cpu source files with new arrangement

In https://github.com/ggml-org/llama.cpp/pull/13892, all arch-specific
implementations were split out into a nested tree structure under
ggml-cpu/arch. This conflicts with standard CGO layout where all
arch-specific source files are expected to live in the same directory as
the parent go module and use suffixes based on GOOS and GOARCH. As such,
there were really two options for getting this to work:

1. Add a patch on top of the GGML sync to rearrange the files to match the
GO layout convention
2. Use CGO directives to conditionally include the nested source files in
the compilation units

This commit does (2) in order to minimize the set of changes needed on top
of the upstream file layout. To get this to work, there are two key things
needed:

1. In cpu.go, #cgo directives are added to explicitly set __${GOARCH}__ in
the preprocessor directives
2. In arch-impls.c|cpp, use an #ifdef | #elif defined | #endif chain to
explicitly include the .c|.cpp files for the given architecture from the
nested directory

* fix: Use mtmd_helper to correctly load the bitmap for the image

* fix: Apply patch for mtmd_text_input

* fix: Add missing stb to llama.cpp rsync-filter

* fix: Add sync'ed stb vendored header

* fix: Use c++17 and include vendor for go wrapper modules

* fix: Update patch 0015 for upstream implementation of uuid

* feat: Bump to the latest tip of the branch

* fix: Update patches for bump

* feat: Bump back to the cenral repo and point at the latest master

This includes granite 4 and a number of other model architectures!

* fix: Revert changes to ggml export GPU UUID patch

* fix: Add patch for GGML_VERSION and GGML_COMMIT constants

* feat: Sync all patched code

* build: Include cmake/common.cmake in ggml sync

* build: Add top-level include for GNUINstallDirs in CMakeLists.txt

This is used to populate CMAKE_INSTALL_BINDIR

* fix: Add a patch to avoid power throttling API on non-msvc windows builds

* fix: Sync patch changes for ggml-cpu.c

* feat: Bump llama.cpp to 4a4f42

This picks up support for Kimi K2 and PLaMO-2

* feat: Sync llama.cpp

* fix: Handle multi-chunk image encodings from mtmd

* fix: Re-number patches after merge with `main`

* feat: Bump to 41e78c in the makefile

* fix: Fix Solar and argsort/copy patches after bump

* fix: Remove Gemma3n CUDA Graphs patch

It was implemented upstream:
https://github.com/ggml-org/llama.cpp/pull/14741

* feat: Sync llama.cpp / ggml after latest bump

* build: Remove unnecessary CFLAGS definitions in cpu.go

* fix: Remove unnecessary additions in the rsync-filter

* fix: Remove unused vendored code for chat template parsing

* Revert "fix: Remove Gemma3n CUDA Graphs patch"

This reverts commit d724caced3ce21f08924d4b7801f94ce6638f6ea.

* fix: Update 0020 CUDA Graphs for gemma3n to keep both llama.cpp and ollama fixes

https://github.com/ollama/ollama/pull/11195#issuecomment-3137312394



* fix: Sync ggml-cuda.cu after keeping both style cuda graph fixes for gemma3n

* unwind mxfp4 patch

Prepare to bump ggml with their impl for mxfp4

* bump

* fix windows build error

* Convert tensors at load time

Repack the mxfp4 tensors as ggmls kernels expect them to be.

* convert mlp bf16 to f32

* buffer the conversion better

* reshape earlier

* openai swiglu

* add ids

* split qkv, gate_up

* fix nested alt tags

* fast attention

* remove debug messages

* fix lint

* remove redundant test

* remap values only if source/target are different

* add back i32->i32 copy

* refactor cpu quants

* clean up vendor

* update patch instructions

* clean up patches

* remove webgpu

* update mem

* also handle gpt-oss

* revert convert changes

---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

1a19df1f