- 14 May, 2025 2 commits
-
-
Daniel Hiltgen authored
Older clients assumed the digest was at least 19 characters long so increase the size of the dummy digest to avoid array out of bounds crashes.
-
Michael Yang authored
-
- 13 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 12 May, 2025 3 commits
-
-
Daniel Hiltgen authored
The quantization PR didn't block all unsupported file types, which this PR fixes. It also updates the API docs to reflect the now reduced set of supported types.
-
Bruce MacDonald authored
When creating a quantized model from safetensors we need the array KV values to be loaded.Changing this value to -1 loads the KV values on the returned layer to be used and saved during quantization.
-
Michael Yang authored
reduce prompt log to trace level
-
- 08 May, 2025 2 commits
-
-
Michael Yang authored
the stream accumulator exits as soon as it sees `api.ProgressResponse(status="success")` which isn't strictly correctly since some requests may have multiple successes, e.g. `/api/create` when the source model needs to be pulled.
-
Michael Yang authored
-
- 07 May, 2025 2 commits
-
-
Daniel Hiltgen authored
If a model is loading, and the request context is canceled during the load by a client closing the connection, and another request is inbound for the same model with a different configuration (context size, etc.) thus requiring a reload, two unload events can be in flight. The first shuts down the original model load, but the second one caused the loss of the new reloading runner reference, thus triggering the leak. The primary fix is detecting the duplicate unload and ignoring the second instance. The load routine is also hardened to ensure we detect clobbering an already present runner and unload it with a warning.
-
Jeffrey Morgan authored
-
- 06 May, 2025 3 commits
-
-
Devon Rifkin authored
Fixes: #5483
-
Michael Yang authored
-
Daniel Hiltgen authored
* Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.
-
- 05 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 03 May, 2025 1 commit
-
-
Daniel Hiltgen authored
This enhances our logging in the scheduler. The initial "waiting for server" log no longer claims an initial error state (now "not responding" which better reflects the actual state). Runners now have slog wiring to report more details about the runner, including PID.
-
- 01 May, 2025 1 commit
-
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
- 30 Apr, 2025 2 commits
-
-
Devon Rifkin authored
* strip out thinking tags in message history for qwen3 & r1 This is in advance of "proper" support where we'll make reasoning configurable and we'll parse out thinking/reasoning tags and provide them to the caller. These models expect there to be no thinking tags in the message history, so this should improve quality * parse model names instead of hacky prefix check
-
Daniel Hiltgen authored
* Adjust initial scheduler refCount Ensure we only set the refCount on success * sched: fix lock order inversion deadlock Under certain race conditions, there was a scenario where the scheduler would get into a deadlock while trying to update free space information while a model was trying to unload.
-
- 29 Apr, 2025 1 commit
-
-
Devon Rifkin authored
this is in part to "pay" for #10452, which doubled the default context length. The combination isn't fully neutral though, because even though the old 4x2k limit and the new 2x4k limit are memory equivalent, the 1x fallback is larger with 4k
-
- 28 Apr, 2025 1 commit
-
-
Devon Rifkin authored
This reverts commit 424f6486.
-
- 25 Apr, 2025 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
the first call to http.ResponseWriter.Write implicitly calls WriteHeader with http.StatusOK if it hasn't already been called. once WriteHeader has been called, subsequent calls has no effect. Write is called when JSON encoding progressUpdateJSON{}. calls to http.ResponseWriter.WriteHeader after the first encode is useless and produces a warning: http: superfluous response.WriteHeader call from github.com/ollama/ollama/server/internal/registry.(*statusCodeRecorder).WriteHeader (server.go:77)
-
- 22 Apr, 2025 1 commit
-
-
Devon Rifkin authored
* increase default context length to 4096 We lower the default numParallel from 4 to 2 and use these "savings" to double the default context length from 2048 to 4096. We're memory neutral in cases when we previously would've used numParallel == 4, but we add the following mitigation to handle some cases where we would have previously fallen back to 1x2048 due to low VRAM: we decide between 2048 and 4096 using a runtime check, choosing 2048 if we're on a one GPU system with total VRAM of <= 4 GB. We purposefully don't check the available VRAM because we don't want the context window size to change unexpectedly based on the available VRAM. We plan on making the default even larger, but this is a relatively low-risk change we can make to quickly double it. * fix tests add an explicit context length so they don't get truncated. The code that converts -1 from being a signal for doing a runtime check isn't running as part of these tests. * tweak small gpu message * clarify context length default also make it actually show up in `ollama serve --help`
-
- 19 Apr, 2025 2 commits
-
-
Michael Yang authored
the models directory should have plenty of storage and also ensure there's no cross-device copy
-
Blake Mizerany authored
Previously, the pull handler would send an error message in the Status field, this prevented the client from using the message as a signal to stop. In the case of the "run" command, it would follow the pull with a "show" which would print a nearly identical "not found" message for unresolved models. Fixes #10307
-
- 17 Apr, 2025 1 commit
-
-
Blake Mizerany authored
-
- 16 Apr, 2025 4 commits
-
-
Blake Mizerany authored
This removes the extra flushProgress() at the end of handlePull. It is unnecessary because final progress updates are flushed in all cases of the main select loop.
-
Blake Mizerany authored
The completed and received counters must work in tandem and the code should better reflect that. Previously, the act of updating them was 2-3 lines of code duplicated in multiple places. This consolidates them into a single update closure for easy reading and maintenance. This also simplifies error handling in places where we can use a return parameter and defer to handle the error case for updates. Also, remove the old Layer field from the trackingReader struct.
-
Daniel Hiltgen authored
Fix flake failures on windows
-
Blake Mizerany authored
This commit adds retry/backoff to the registry client for pull requests. Also, revert progress indication to match original client's until we can "get it right." Also, make WithTrace wrap existing traces instead of clobbering them. This allows clients to compose traces.
-
- 14 Apr, 2025 1 commit
-
-
Devon Rifkin authored
alphabetized the compat list and then added a single header fixes: #9801
-
- 10 Apr, 2025 1 commit
-
-
Tom Sheffler authored
--------- Co-authored-by:Parth Sareen <parth.sareen@ollama.com>
-
- 09 Apr, 2025 1 commit
-
-
Ire Gaddr authored
-
- 08 Apr, 2025 1 commit
-
-
Parth Sareen authored
-
- 07 Apr, 2025 1 commit
-
-
Alex Rozgo authored
-
- 03 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
No functional change. Many different done reasons can be set at the runner level, so rather than obsuring them we should return them to the server process and let it choose what to do with the done reason. This separates the API concerns from the runner.
-
- 02 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
-
- 01 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
-
- 31 Mar, 2025 1 commit
-
-
Blake Mizerany authored
This change adds tracking of download chunks during the pull process so that subsequent pulls can skip downloading already completed chunks. This works across restarts of ollama. Currently, download state will be lost if a prune is triggered during a pull (e.g. restart or remove). This issue should be addressed in a follow-up PR.
-
- 28 Mar, 2025 1 commit
-
-
CYJiang authored
Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-