Commits · 1f371ea92f7ebe4edd208b6732753473b2c4d0cd · OpenDAS / ollama

20 May, 2025 1 commit
- llama: fix incorrect initialization of C.struct_common_sampler_cparams.penalty_present (#10779) · e6a800ca
  DarkCaster authored May 20, 2025
  
  e6a800ca
16 May, 2025 1 commit

model: handle multiple eos tokens (#10577) · 333e3604

Michael Yang authored May 16, 2025

* get eos_token_id from generation_config.json

* refactor

* include both ids and strings in trace

* comments

* remove special case for gemma3 special vocab (#10743)

333e3604

14 May, 2025 1 commit
- chore: update mllama to use ollama engine (#10637) · 23125648
  Michael Yang authored May 13, 2025
  
  23125648
12 May, 2025 1 commit
- llama: update to commit de4c07f93 (#10655) · 0cefd46f
  Jeffrey Morgan authored May 12, 2025
  
  0cefd46f
10 May, 2025 1 commit
- llama: allocate grammar buffer based on schema length (#10649) · ecf14a22
  frob authored May 10, 2025
  
  ecf14a22
08 May, 2025 1 commit
- api: remove unused sampling parameters (#10581) · fa9973cd
  Jeffrey Morgan authored May 08, 2025
  
  fa9973cd
06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

05 May, 2025 2 commits
- api: remove unused or unsupported api options (#10574) · 3b2d2c83
  Jeffrey Morgan authored May 05, 2025
```
Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options
```
  3b2d2c83
- all: fix cgo compiler warnings on windows (#10563) · 91390502
  Jeffrey Morgan authored May 05, 2025
  
  91390502
24 Apr, 2025 1 commit
- llama: remove model loading for grammar (#10096) · a53d744b
  Parth Sareen authored Apr 24, 2025
  
  a53d744b
16 Apr, 2025 1 commit
- llama: update to commit 71e90e88 (#10192) · 943464cc
  Jeffrey Morgan authored Apr 16, 2025
  
  943464cc
31 Mar, 2025 1 commit

runner: clear cache when shift is not possible (#9433) · 66b25392

Bruce MacDonald authored Mar 31, 2025

Clear KV cache when shift operation is not supported by model.
Added KvCacheCanShift() check to handle models that can't perform cache shifts,
falling back to full cache clear while preserving logical token history to
maintain expected behavior when context window fills up.

66b25392

10 Mar, 2025 1 commit
- sample: temporarily use grammars for constrained generation in new engine (#9586) · e093db92
  Jeffrey Morgan authored Mar 10, 2025
  
  e093db92
04 Mar, 2025 1 commit

ml/backend/ggml: consolidate system info logging · 05a01fde

Michael Yang authored Feb 28, 2025

- output backend system info when initializing the backend. this ensures
  this information is always present without needing to be called
  explicitly
- convert to structured logging
- enumerate devices rather than backends since devices are ordered
- track device indices grouped by device name

05a01fde

28 Feb, 2025 1 commit
- fix: replace deprecated functions · 657685e8
  Michael Yang authored Feb 28, 2025
  
  657685e8
27 Feb, 2025 2 commits
- ml/backend/ggml: fix debug logging · a59f6652
  Michael Yang authored Feb 26, 2025
  
  a59f6652
- llama: update llama.cpp vendor code to commit d7cfe1ff (#9356) · d7d7e996
  Jeffrey Morgan authored Feb 26, 2025
  
  d7d7e996
06 Feb, 2025 1 commit

runner: avoid buffer overwrite when generating multiple embeddings (#8714) · 928911bc

Diego Pereira authored Feb 05, 2025

Shield the code processing the embedding result
from subsequent calls that may overwrite the same
buffer to process a second input when retrieving
model embeddings.

928911bc

31 Jan, 2025 1 commit
- Revert "cgo: use O3" · 548a9f56
  Michael Yang authored Jan 30, 2025
```
This reverts commit bea1f1fa.
```
  548a9f56
30 Jan, 2025 1 commit
- cgo: use O3 · bea1f1fa
  Michael Yang authored Jan 30, 2025
  
  bea1f1fa
29 Jan, 2025 1 commit

next build (#8539) · dcfb7a10

Michael Yang authored Jan 29, 2025



* add build to .dockerignore

* test: only build one arch

* add build to .gitignore

* fix ccache path

* filter amdgpu targets

* only filter if autodetecting

* Don't clobber gpu list for default runner

This ensures the GPU specific environment variables are set properly

* explicitly set CXX compiler for HIP

* Update build_windows.ps1

This isn't complete, but is close.  Dependencies are missing, and it only builds the "default" preset.

* build: add ollama subdir

* add .git to .dockerignore

* docs: update development.md

* update build_darwin.sh

* remove unused scripts

* llm: add cwd and build/lib/ollama to library paths

* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS

* add additional cmake output vars for msvc

* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12

* remove unncessary filepath.Dir, cleanup

* add hardware-specific directory to path

* use absolute server path

* build: linux arm

* cmake install targets

* remove unused files

* ml: visit each library path once

* build: skip cpu variants on arm

* build: install cpu targets

* build: fix workflow

* shorter names

* fix rocblas install

* docs: clean up development.md

* consistent build dir removal in development.md

* silence -Wimplicit-function-declaration build warnings in ggml-cpu

* update readme

* update development readme

* llm: update library lookup logic now that there is one runner (#8587)

* tweak development.md

* update docs

* add windows cuda/rocm tests

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

dcfb7a10

08 Jan, 2025 1 commit
- llama: update vendored code to commit 46e3556 (#8308) · 1deafd82
  Jeffrey Morgan authored Jan 08, 2025
  
  1deafd82
13 Dec, 2024 1 commit
- runner: switch logging back to stderr (#8091) · 60f75560
  Daniel Hiltgen authored Dec 13, 2024
```
This puts the low-level runner logging back on stderr for consistency with prior releases
```
  60f75560
11 Dec, 2024 2 commits

llama: preserve field order in user-defined JSON schemas (#8002) · 9039c821

Blake Mizerany authored Dec 11, 2024

Previously we decoded and re-encoded JSON schemas during validation,
which served no purpose since json.RawMessage already validates JSON
syntax. Worse, the re-encoding lost field ordering from the original
schema, which affects inference quality during step-by-step reasoning.

While fixing this ordering issue by using json.RawMessage directly,
testing revealed that schema_to_grammar (from llama.cpp) also fails to
preserve field order during grammar generation. This appears to be the
root cause of inference degradation.

This change prevents us from mangling the user's original schema order,
but we still need to address the ordering issue in schema_to_grammar.
That will be a separate change.

Updates #7978

9039c821

llama: update vendored code to commit 40c6d79f (#7875) · 527cc978
Jeffrey Morgan authored Dec 10, 2024

527cc978

10 Dec, 2024 2 commits

Remove unused runner CpuFeatures (#8032) · b9ccb374

Daniel Hiltgen authored Dec 10, 2024

The final implementation of #7499 removed dynamic vector requirements
in favor of a simpler filename based model, and this was left over logic that
is no longer needed.

b9ccb374

build: Make target improvements (#7499) · 4879a234

Daniel Hiltgen authored Dec 10, 2024

* llama: wire up builtin runner

This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build.  After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.

* build: Make target improvements

Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.

* Support customized CPU flags for runners

This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash.  If the user builds a customized set, we omit the naming
scheme and don't check for compatibility.  This avoids checking
requirements at runtime, so that logic has been removed as well.  This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.

* Use relative paths

If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.

* Remove payloads from main binary

* install: clean up prior libraries

This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.

4879a234

05 Dec, 2024 1 commit

api: structured outputs - chat endpoint (#7900) · 630e7dc6

Parth Sareen authored Dec 04, 2024



Adds structured outputs to chat endpoint
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>

630e7dc6

03 Dec, 2024 1 commit
- llm: introduce k/v context quantization (vRAM improvements) (#6279) · 1bdab9fd
  Sam authored Dec 04, 2024
  
  1bdab9fd
20 Nov, 2024 1 commit

runner.go: Retry decoding after defragmentation if needed · 7121dfa3

Jesse Gross authored Nov 19, 2024

Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.

For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.

7121dfa3

19 Nov, 2024 1 commit

fix(runner): Set logits to 0 if false on Batch.Add · 807ace5b

Gabe Goodhart authored Nov 19, 2024

https://github.com/ollama/ollama/issues/7656


Branch: Granite3StoppingBug-7656
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

807ace5b

14 Nov, 2024 1 commit
- fix(mllama): sync backend between batches · 5b3393b6
  Michael Yang authored Nov 13, 2024
  
  5b3393b6
12 Nov, 2024 1 commit
- Jetpack support for Go server (#7217) · df011054
  Daniel Hiltgen authored Nov 12, 2024
```
This adds support for the Jetson JetPack variants into the Go runner
```
  df011054
02 Nov, 2024 2 commits

llama: Improve error handling · 312d9de1

Jesse Gross authored Nov 01, 2024

Check for NULL return values from llama.cpp in more places and
convert them into Go errors, which should make debugging easier
in the future rather than having hidden surprises in our data
structures.

312d9de1

runner.go: Only allocate 1 element embedding batches for mllama · a103dae0

Jesse Gross authored Nov 01, 2024

Mllama has large embeddings (100 MB per image) and each embedding is
represented as 1 token when passed to llama.cpp. Batches are pre-
allocated for the size of the tokens times the batch size, so this
results in allocations of over 50 GB at the default batch size.
On some systems, these mallocs will fail.

Since an image is represented as a single token and mllama doesn't
support more than 1 image per request, we only need to allocate a
batch size of 1, which is much more reasonable. In addition, for
non-multimodal models, we don't need to allocate the embedding
batches at all.

Fixes #7464

a103dae0

30 Oct, 2024 3 commits

runner.go: Better abstract vision model integration · c826e574

Jesse Gross authored Oct 11, 2024



-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly
Co-authored-by: Michael Yang <mxyng@pm.me>

c826e574

Soften windows clang requirement (#7428) · 712e99d4

Daniel Hiltgen authored Oct 30, 2024

This will no longer error if built with regular gcc on windows.  To help
triage issues that may come in related to different compilers, the runner now
reports the compier used by cgo.

712e99d4

Remove submodule and shift to Go server - 0.4.0 (#7157) · b754f5a6

Daniel Hiltgen authored Oct 30, 2024

* Remove llama.cpp submodule and shift new build to top

* CI: install msys and clang gcc on win

Needed for deepseek to work properly on windows

b754f5a6

29 Oct, 2024 1 commit

runner.go: Better handle return NULL values from llama.cpp · de1557a0

Jesse Gross authored Oct 22, 2024

Llama.cpp sometimes returns NULL as a return value to report an
error. We should explicitly check for this and convert it to a Go
error rather than putting NULL in our data structures and waiting
for it to blow up later.

de1557a0

24 Oct, 2024 1 commit

Improve dependency gathering logic (#7345) · 3085c47b

Daniel Hiltgen authored Oct 24, 2024

This unfies the rocm/cuda dependency logic into the makefile
and fixes a missing define which broke windows rocm

3085c47b