Commits · aaa7818000c42a82fc030212c35ef83f9799efd7 · OpenDAS / ollama

29 May, 2025 1 commit

Jesse Gross authored Apr 24, 2025

This enables matching up devices and information reported by the backend
with system management libraries such as nvml to get accurate free
memory reporting.

aaa78180

22 May, 2025 1 commit

ggml: Report graph memory for failed allocations · 6db8a377

Jesse Gross authored May 16, 2025

GGML has a function to report the allocated size of a backend buffer.
However, this returns 0 if we tried to allocate a buffer and it failed.
For memory management purposes, it's important to know how much we were
trying to allocate. This extends the API to report attempted sizes for
all buffers and whether it succeeeded.

6db8a377

14 May, 2025 2 commits
- model: add Qwen2.5-VL support (#10385) · 0aa8b371
  Bruce MacDonald authored May 13, 2025
  
  0aa8b371
- chore: update mllama to use ollama engine (#10637) · 23125648
  Michael Yang authored May 13, 2025
  
  23125648
13 May, 2025 2 commits
- llama: fix defrag patch to defragment when no slots are available (#10695) · f46df4e5
  Jeffrey Morgan authored May 13, 2025
  
  f46df4e5
- llama: fix crash on snowflake embedding model (#10690) · 4b903f08
  Jeffrey Morgan authored May 13, 2025
  
  4b903f08
12 May, 2025 1 commit
- llama: update to commit de4c07f93 (#10655) · 0cefd46f
  Jeffrey Morgan authored May 12, 2025
  
  0cefd46f
06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

02 May, 2025 2 commits

ollamarunner: Re-enable worst case graph preallocation. · c2f5d666

Jesse Gross authored May 02, 2025

Worst case graph preallocation was disabled by a27462b7
"ollamarunner: Temporarily disable worst case graph preallocation"
since it caused crashes with large batches when not using the GPU.

This backports upstream llama.cpp commit f057808
"ggml: Don't assert fail when tensor data changes (#13222)", which
fixes the underlying bug and allows reverting the previous workaround.

c2f5d666

llama: update to commit e1e8e099 (#10513) · 8dd12c87
Jeffrey Morgan authored May 01, 2025

8dd12c87

25 Apr, 2025 1 commit
- llama: update to commit 2016f07b (#10352) · e9e5f61c
  Jeffrey Morgan authored Apr 25, 2025
  
  e9e5f61c
24 Apr, 2025 1 commit
- llama: remove model loading for grammar (#10096) · a53d744b
  Parth Sareen authored Apr 24, 2025
  
  a53d744b
17 Apr, 2025 1 commit
- ml: add missing cmake property and remove additional CMakeLists.txt (#10310) · dc264be6
  Jeffrey Morgan authored Apr 16, 2025
  
  dc264be6
16 Apr, 2025 1 commit
- llama: update to commit 71e90e88 (#10192) · 943464cc
  Jeffrey Morgan authored Apr 16, 2025
  
  943464cc
15 Apr, 2025 1 commit

ggml: Free ggml_backend_buffer_t when releasing buffer · ccb7eb81

Jesse Gross authored Apr 14, 2025

When ggml_backend_buffer_free() is called, the device memory
is released but not all backends consistently release the actual
ggml_backend_buffer_t in system RAM, causing a memory leak.

Bug #10040

ccb7eb81

03 Apr, 2025 1 commit

model: support for mistral-small in the ollama runner · 6bd0a983

Bruce MacDonald authored Mar 14, 2025

Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.

6bd0a983

27 Mar, 2025 1 commit
- Add gfx1200 & gfx1201 support on linux (#9878) · ead27aa9
  saman-amd authored Mar 27, 2025
  
  ead27aa9
15 Mar, 2025 1 commit
- gemma3 quantization (#9776) · ef378ad6
  Patrick Devine authored Mar 14, 2025
  
  ef378ad6
11 Mar, 2025 1 commit
- ollama debug tensor · 9e4642e9
  Michael Yang authored Mar 09, 2025
  
  9e4642e9
07 Mar, 2025 1 commit
- llama: fix kv loading on snowflake-arctic-embed models (#9536) · 4289c743
  Jeffrey Morgan authored Mar 07, 2025
  
  4289c743
03 Mar, 2025 1 commit

fix: own lib/ollama directory · ba7d3124

Michael Yang authored Mar 03, 2025

expand backend loading error handling to catch more problems and log
them instead of panicing

ba7d3124

28 Feb, 2025 1 commit
- llama: add phi4 mini support (#9403) · 98d44fa3
  Jeffrey Morgan authored Feb 27, 2025
  
  98d44fa3
27 Feb, 2025 1 commit
- llama: update llama.cpp vendor code to commit d7cfe1ff (#9356) · d7d7e996
  Jeffrey Morgan authored Feb 26, 2025
  
  d7d7e996
24 Feb, 2025 1 commit
- ml/backend/ggml: fix crash on windows paths with wide characters (#9305) · 8c13cfa4
  Jeffrey Morgan authored Feb 23, 2025
  
  8c13cfa4
20 Feb, 2025 1 commit
- reorder patches · bda4ef6c
  Michael Yang authored Feb 19, 2025
  
  bda4ef6c
19 Feb, 2025 1 commit
- llama: add patch to fix ggml backend reg on Linux with utf-8 characters in the path (#9159) · d2eb226c
  Jeffrey Morgan authored Feb 18, 2025
  
  d2eb226c
18 Feb, 2025 1 commit

build: remove backend build for sapphirerapids · 5f8c0318

Michael Yang authored Feb 18, 2025

sapphire rapids has amx support but it ends up having a negative
performance impact.

emerald rapids also has amx support with a positive performance impact
however there's no reasonable way in ggml to differentiate between the
two. the impact is small (~6%) so disable amx entirely for simplicity

5f8c0318

14 Feb, 2025 1 commit
- ml/backend/ggml: stable sort devices by score (#9081) · 6600bd7d
  Jeffrey Morgan authored Feb 13, 2025
  
  6600bd7d
11 Feb, 2025 1 commit
- fix: harden backend loading (#9024) · 49df03da
  Michael Yang authored Feb 11, 2025
```
* wrap ggml_backend_load_best in try/catch
* ignore non-ollama paths
```
  49df03da
10 Feb, 2025 1 commit
- ml/backend/ggml: fix crash on dlopen for non-AVX systems (#8976) · f4711da7
  Jeffrey Morgan authored Feb 10, 2025
  
  f4711da7
05 Feb, 2025 1 commit
- llama: use dynamic backend loading for mllama and clip (#8835) · cd3fbf1c
  Jeffrey Morgan authored Feb 05, 2025
  
  cd3fbf1c
29 Jan, 2025 1 commit

next build (#8539) · dcfb7a10

Michael Yang authored Jan 29, 2025



* add build to .dockerignore

* test: only build one arch

* add build to .gitignore

* fix ccache path

* filter amdgpu targets

* only filter if autodetecting

* Don't clobber gpu list for default runner

This ensures the GPU specific environment variables are set properly

* explicitly set CXX compiler for HIP

* Update build_windows.ps1

This isn't complete, but is close.  Dependencies are missing, and it only builds the "default" preset.

* build: add ollama subdir

* add .git to .dockerignore

* docs: update development.md

* update build_darwin.sh

* remove unused scripts

* llm: add cwd and build/lib/ollama to library paths

* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS

* add additional cmake output vars for msvc

* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12

* remove unncessary filepath.Dir, cleanup

* add hardware-specific directory to path

* use absolute server path

* build: linux arm

* cmake install targets

* remove unused files

* ml: visit each library path once

* build: skip cpu variants on arm

* build: install cpu targets

* build: fix workflow

* shorter names

* fix rocblas install

* docs: clean up development.md

* consistent build dir removal in development.md

* silence -Wimplicit-function-declaration build warnings in ggml-cpu

* update readme

* update development readme

* llm: update library lookup logic now that there is one runner (#8587)

* tweak development.md

* update docs

* add windows cuda/rocm tests

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

dcfb7a10

08 Jan, 2025 1 commit
- llama: update vendored code to commit 46e3556 (#8308) · 1deafd82
  Jeffrey Morgan authored Jan 08, 2025
  
  1deafd82
17 Dec, 2024 1 commit

llama: Ensure KV cache is fully defragmented. · 08a832b4

Jesse Gross authored Dec 12, 2024

Sometimes the KV cache requires defragmentation even without
triggering the threshold heuristic. In this case, decoding
will not being able to find a KV cache slot. This is particularly
difficult for the caller to handle if it happens in between
ubatches. To avoid this, we should immediately trigger a defrag.

In addition, a heavily fragmented cache can require more than
max_moves to defragment. Currently, we stop when we hit the limit
but this can leave a cache that still does not have adequate space
even after defragmentation is triggered. Instead, we should do
multiple batches of processing until everything is complete.

Fixes #7949

08a832b4

14 Dec, 2024 1 commit
- llama: update vendor code to commit ba1cb19c (#8101) · 7a81daf0
  Jeffrey Morgan authored Dec 14, 2024
  
  7a81daf0
12 Dec, 2024 1 commit
- llama: enable JSON schema key ordering for generating grammars (#8055) · 18f6a98b
  Parth Sareen authored Dec 11, 2024
  
  18f6a98b
11 Dec, 2024 1 commit
- llama: update vendored code to commit 40c6d79f (#7875) · 527cc978
  Jeffrey Morgan authored Dec 10, 2024
  
  527cc978
30 Oct, 2024 1 commit

runner.go: Better abstract vision model integration · c826e574

Jesse Gross authored Oct 11, 2024



-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly
Co-authored-by: Michael Yang <mxyng@pm.me>

c826e574

26 Oct, 2024 1 commit
- Fix deepseek deseret regex (#7369) · 099f7077
  Daniel Hiltgen authored Oct 26, 2024
```
On windows compiled with gcc the c++ regex library failed to handle
the characters
```
  099f7077
18 Oct, 2024 1 commit

image processing for llama3.2 (#6963) · c7cb0f06

Patrick Devine authored Oct 18, 2024


Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>

c7cb0f06