- 11 May, 2025 2 commits
-
-
HardCodeDev authored
-
HardCodeDev authored
-
- 10 May, 2025 5 commits
-
-
frob authored
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
Michael Yang authored
ml.Dump will preserve default values if not specified
-
AliAhmedNada authored
-
Bruce MacDonald authored
-
- 08 May, 2025 5 commits
-
-
Michael Yang authored
the stream accumulator exits as soon as it sees `api.ProgressResponse(status="success")` which isn't strictly correctly since some requests may have multiple successes, e.g. `/api/create` when the source model needs to be pulled.
-
Michael Yang authored
-
Michael Yang authored
-
Jeffrey Morgan authored
-
Jesse Gross authored
The correct constant to remove all entries to the end of the sequence for the Ollama engine is math.MaxInt32. -1 is used by the old engine. The impact of this is currently minimal because it would only occur in situations that are not supported by the implemented models or rarely used options.
-
- 07 May, 2025 5 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
If a model is loading, and the request context is canceled during the load by a client closing the connection, and another request is inbound for the same model with a different configuration (context size, etc.) thus requiring a reload, two unload events can be in flight. The first shuts down the original model load, but the second one caused the loss of the new reloading runner reference, thus triggering the leak. The primary fix is detecting the duplicate unload and ignoring the second instance. The load routine is also hardened to ensure we detect clobbering an already present runner and unload it with a warning.
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
err in the go routine should not be shared with the outer scope
-
Daniel Hiltgen authored
This reduces the size of our Windows installer payloads by ~256M by dropping support for nvidia drivers older than Feb 2023. Hardware support is unchanged. Linux default bundle sizes are reduced by ~600M to 1G.
-
- 06 May, 2025 5 commits
-
-
Aharon Bensadoun authored
-
Devon Rifkin authored
Fixes: #5483
-
Michael Yang authored
-
Daniel Hiltgen authored
* Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.
-
Michael Yang authored
-
- 05 May, 2025 7 commits
-
-
Jeffrey Morgan authored
Some options listed in api/types.go are not supported in newer models, or have been deprecated in the past. This is the first of a series of PRs to clean up the API options
-
Michael Yang authored
* default max term height * error on out of tree files
-
Jesse Gross authored
Most of the time this is not an error.
-
Daniel Hiltgen authored
-
Ashok Gelal authored
This hides the LlamaServer blank window when chatting outside of the terminal (say like with an app like Msty). This has no other side effects when invoking it the regular way.
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 04 May, 2025 1 commit
-
-
湛露先生 authored
Signed-off-by:zhanluxianshen <zhanluxianshen@163.com>
-
- 03 May, 2025 3 commits
-
-
Daniel Hiltgen authored
For all search path env vars make sure our dirs are first to avoid potentially finding other incompatible libraries on the users system. Also fixes a minor build script glitch for windows rocm
-
Daniel Hiltgen authored
This enhances our logging in the scheduler. The initial "waiting for server" log no longer claims an initial error state (now "not responding" which better reflects the actual state). Runners now have slog wiring to report more details about the runner, including PID.
-
aritra saha authored
-
- 02 May, 2025 4 commits
-
-
Jesse Gross authored
Successfully completing processing with an errgroup cancels the associated context. However, we also have a goroutine that is checking for cancelation of the context. As a result, there is a race where the goroutine can pick up the cancelation and report an error, replacing the sucessful error message. To avoid that, this replaces the goroutine with a cancelation check when we are reading files. This also has the advantage of stopping all reads relatively quickly on error and also ensuring that there are no outstanding I/O operations when we return in this case. The downside is that if a file read blocks forever (for example, over the network) then cancelation of the context effectively won't be honored. However, this is also true for other smaller files we read and the tensors are read in small chunks (128K), so it's consistent and better on balance overall.
-
Jesse Gross authored
Worst case graph preallocation was disabled by a27462b7 "ollamarunner: Temporarily disable worst case graph preallocation" since it caused crashes with large batches when not using the GPU. This backports upstream llama.cpp commit f057808 "ggml: Don't assert fail when tensor data changes (#13222)", which fixes the underlying bug and allows reverting the previous workaround.
-
Harsh Nevse authored
-
Jeffrey Morgan authored
-
- 01 May, 2025 3 commits
-
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
Jesse Gross authored
In some cases, we can't find a cache slot when using sliding window attention. It would be helpful in this (and other cases) to know what the batch size is. Bug #10127
-
Jesse Gross authored
The context (and therefore associated input tensors) was not being properly closed when images were being processed. We were trying to close them but in reality we were closing over an empty list, preventing anything from actually being freed. Fixes #10434
-