Commits · 3fa78598a1e86faee9390ded4b43e78ca3bef816 · OpenDAS / ollama

10 May, 2025 1 commit
- cmd: strip single quotes from image page (#10636) · 3fa78598
  Bruce MacDonald authored May 09, 2025
  
  3fa78598
08 May, 2025 5 commits

fix: stream accumulator exits early (#10593) · 0d6e35d3

Michael Yang authored May 08, 2025

the stream accumulator exits as soon as it sees `api.ProgressResponse(status="success")` which isn't strictly correctly
since some requests may have multiple successes, e.g. `/api/create` when the source model needs to be pulled.

0d6e35d3

lint: enable usetesting, disable tenv (#10594) · 6e9a7a25
Michael Yang authored May 08, 2025

6e9a7a25
chore: remove unused ZipReader type (#10621) · b585a581
Michael Yang authored May 08, 2025

b585a581
api: remove unused sampling parameters (#10581) · fa9973cd
Jeffrey Morgan authored May 08, 2025

fa9973cd

ollamarunner: Use correct constant to remove cache entries · 3d9498a4

Jesse Gross authored May 07, 2025

The correct constant to remove all entries to the end of the sequence
for the Ollama engine is math.MaxInt32. -1 is used by the old engine.

The impact of this is currently minimal because it would only occur
in situations that are not supported by the implemented models or
rarely used options.

3d9498a4

07 May, 2025 5 commits

CI: trigger downstream release process (#10508) · 3098c8b2
Daniel Hiltgen authored May 07, 2025

3098c8b2

sched: fix race leading to orphaned runners (#10599) · 5e380c3b

Daniel Hiltgen authored May 07, 2025

If a model is loading, and the request context is canceled during the load
by a client closing the connection, and another request is inbound for the
same model with a different configuration (context size, etc.) thus requiring
a reload, two unload events can be in flight. The first shuts down the
original model load, but the second one caused the loss of the new
reloading runner reference, thus triggering the leak.

The primary fix is detecting the duplicate unload and ignoring the second
instance. The load routine is also hardened to ensure we detect
clobbering an already present runner and unload it with a warning.

5e380c3b

api: remove unused RetrieveModelResponse type (#10603) · 392de840
Jeffrey Morgan authored May 06, 2025

392de840
fix data race in WriteGGUF (#10598) · af31ccef
Daniel Hiltgen authored May 06, 2025
```
err in the go routine should not be shared with the outer scope
```
af31ccef

remove cuda v11 (#10569) · fa393554

Daniel Hiltgen authored May 06, 2025

This reduces the size of our Windows installer payloads by ~256M by dropping
support for nvidia drivers older than Feb 2023. Hardware support is unchanged.

Linux default bundle sizes are reduced by ~600M to 1G.

fa393554

06 May, 2025 5 commits
- readme: add Flufy to community integrations (#9719) · 307e3b3e
  Aharon Bensadoun authored May 07, 2025
  
  307e3b3e
- server: send 405 instead of 404 for unallowed methods (#10275) · 4090aca9
  Devon Rifkin authored May 06, 2025
```
Fixes: #5483
```
  4090aca9
- server: remove internal cmd (#10595) · 92ce438d
  Michael Yang authored May 06, 2025
  
  92ce438d
- Move quantization to new backend (#10363) · 42481045
  Daniel Hiltgen authored May 06, 2025
```
* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.
```
  42481045
- discover: fix compiler warnings (#10572) · 95e744be
  Michael Yang authored May 06, 2025
  
  95e744be
05 May, 2025 7 commits
- api: remove unused or unsupported api options (#10574) · 3b2d2c83
  Jeffrey Morgan authored May 05, 2025
```
Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options
```
  3b2d2c83
- create blobs in parallel (#10135) · d931ee8f
  Michael Yang authored May 05, 2025
```
* default max term height
* error on out of tree files
```
  d931ee8f
- ggml: Reduce log level of "key not found" · 70736007
  Jesse Gross authored May 05, 2025
```
Most of the time this is not an error.
```
  70736007
- win: lint fix (#10571) · b1c40138
  Daniel Hiltgen authored May 05, 2025
  
  b1c40138
- Hide empty terminal window (#8668) · 17466217
  Ashok Gelal authored May 05, 2025
```
This hides the LlamaServer blank window when chatting outside of the terminal (say like with an app like Msty). This has no other side effects when invoking it the regular way.
```
  17466217
- server: fix panic when runner.Options is nil (#10566) · 1703d147
  Jeffrey Morgan authored May 05, 2025
  
  1703d147
- all: fix cgo compiler warnings on windows (#10563) · 91390502
  Jeffrey Morgan authored May 05, 2025
  
  91390502
04 May, 2025 1 commit
- file close check and close. (#10554) · 7e5c8eee
  湛露先生 authored May 05, 2025
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
  7e5c8eee
03 May, 2025 3 commits

win: ensure ollama paths come first (#10549) · 6a74bba7

Daniel Hiltgen authored May 03, 2025

For all search path env vars make sure our dirs are first
to avoid potentially finding other incompatible libraries
on the users system.

Also fixes a minor build script glitch for windows rocm

6a74bba7

sched: logging improvements (#10550) · 76ea735a

Daniel Hiltgen authored May 03, 2025

This enhances our logging in the scheduler. The initial "waiting for server" log
no longer claims an initial error state (now "not responding" which better reflects
the actual state). Runners now have slog wiring to report more details about the
runner, including PID.

76ea735a

readme: add llama 4 models (#10530) · dd1d4e99
aritra saha authored May 03, 2025

dd1d4e99

02 May, 2025 4 commits

ggml: Fix race that resulted in "context canceled" when loading · a6ef73f4

Jesse Gross authored May 01, 2025

Successfully completing processing with an errgroup cancels the
associated context. However, we also have a goroutine that is checking
for cancelation of the context. As a result, there is a race where
the goroutine can pick up the cancelation and report an error,
replacing the sucessful error message.

To avoid that, this replaces the goroutine with a cancelation check
when we are reading files. This also has the advantage of stopping
all reads relatively quickly on error and also ensuring that there are
no outstanding I/O operations when we return in this case.

The downside is that if a file read blocks forever (for example, over
the network) then cancelation of the context effectively won't be
honored. However, this is also true for other smaller files we read
and the tensors are read in small chunks (128K), so it's consistent
and better on balance overall.

a6ef73f4

ollamarunner: Re-enable worst case graph preallocation. · c2f5d666

Jesse Gross authored May 02, 2025

Worst case graph preallocation was disabled by a27462b7
"ollamarunner: Temporarily disable worst case graph preallocation"
since it caused crashes with large batches when not using the GPU.

This backports upstream llama.cpp commit f057808
"ggml: Don't assert fail when tensor data changes (#13222)", which
fixes the underlying bug and allows reverting the previous workaround.

c2f5d666

readme: update link to langchain in community integrations (#10465) · 57fb759f
Harsh Nevse authored May 02, 2025

57fb759f
llama: update to commit e1e8e099 (#10513) · 8dd12c87
Jeffrey Morgan authored May 01, 2025

8dd12c87

01 May, 2025 6 commits

image: add vision capability for projector-based models (#10509) · e6d2d041
frob authored May 02, 2025
```
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
```
e6d2d041

kvcache: Log batch size if we can't find a slot · 074bac84

Jesse Gross authored May 01, 2025

In some cases, we can't find a cache slot when using sliding window
attention. It would be helpful in this (and other cases) to know what
the batch size is.

Bug #10127

074bac84

ollamarunner: Fix memory leak when processing images · 8e8f2c6d

Jesse Gross authored May 01, 2025

The context (and therefore associated input tensors) was not being
properly closed when images were being processed. We were trying to
close them but in reality we were closing over an empty list, preventing
anything from actually being freed.

Fixes #10434

8e8f2c6d

readme: add Jirapt project to community integrations (#10522) · 938e8447
AliAhmedNada authored May 02, 2025

938e8447
readme: change granite3.2 to granite3.3 (#10525) · d5d5f0c4
aritra saha authored May 02, 2025
```
Update the list for readme
```
d5d5f0c4

fix: write gguf padding (#10510) · a7835c67

Michael Yang authored Apr 30, 2025

* add gguf_test

* fix padding

padding was being added to offset but not to the running count

a7835c67

30 Apr, 2025 3 commits

strip out thinking tags in message history for qwen3 & r1 (#10490) · ad3c7c9b

Devon Rifkin authored Apr 30, 2025

* strip out thinking tags in message history for qwen3 & r1

This is in advance of "proper" support where we'll make reasoning
configurable and we'll parse out thinking/reasoning tags and provide
them to the caller. These models expect there to be no thinking tags in
the message history, so this should improve quality

* parse model names instead of hacky prefix check

ad3c7c9b

Fix "Stopping..." scheduler hang (#10487) · 415c8fcc

Daniel Hiltgen authored Apr 30, 2025

* Adjust initial scheduler refCount

Ensure we only set the refCount on success

* sched: fix lock order inversion deadlock

Under certain race conditions, there was a scenario where the scheduler would
get into a deadlock while trying to update free space information while a model
was trying to unload.

415c8fcc

Narrow set of paths we load GGML from (#10485) · 718eda1b

Daniel Hiltgen authored Apr 30, 2025

Users may have other incompatible GGML installs on their systems.
This will prevent us from trying to load them from the path.

718eda1b