- 14 Mar, 2025 5 commits
-
-
Jesse Gross authored
Currently there is a single context per sequence, shared all by all multimodal inputs. Since we build a vision encoder graph per image, with a large number of inputs we can eventually hit the maximum number of graph nodes per context. This changes to use a separate context for each image, ensuring that available resource limits are consistent.
-
Jesse Gross authored
Models may require that a set of inputs all be processed as part of the same batch. For example, if an image has multiple patches with fully connected attention between them, we should not split the batch in the middle of an image. Fixes #9697
-
Bruce MacDonald authored
This commit refactors the LLM subsystem by removing internal subprocess request and response types. It consolidates duplicate type definitions across the codebase, moving them to centralized locations. The change also standardizes interfaces between components, simplifies the ServerStatusResp struct, and moves the ParseDurationMs function to a common package. This cleanup reduces code duplication between different runner implementations (llamarunner and ollamarunner).
-
Blake Mizerany authored
-
Blake Mizerany authored
Replace large-chunk blob downloads with parallel small-chunk verification to solve timeout and performance issues. Registry users experienced progressively slowing download speeds as large-chunk transfers aged, often timing out completely. The previous approach downloaded blobs in a few large chunks but required a separate, single-threaded pass to read the entire blob back from disk for verification after download completion. This change uses the new chunksums API to fetch many smaller chunk+digest pairs, allowing concurrent downloads and immediate verification as each chunk arrives. Chunks are written directly to their final positions, eliminating the entire separate verification pass. The result is more reliable downloads that maintain speed throughout the transfer process and significantly faster overall completion, especially over unstable connections or with large blobs.
-
- 13 Mar, 2025 17 commits
-
-
Michael Yang authored
count gemma3 vision tensors
-
Michael Yang authored
-
Bradley Erickson authored
-
Michael Yang authored
-
Michael Yang authored
the largest operation is by far (q @ k) so just count that for simplicity
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
fix: error if image requested without vision model
-
Patrick Devine authored
Add metadata and tensor information to the show command to be able to see more information about a model. This outputs the same data as shown on the model details page on ollama.com
-
Patrick Devine authored
-
Michael Yang authored
fix: error on models that don't support embeddings
-
Michael Yang authored
Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
ollama-debug.c: correct mistype
-
Parth Sareen authored
-
shane.xb.qian authored
* macOS has different definition per info from @mxyng
-
- 12 Mar, 2025 8 commits
-
-
ParthSareen authored
-
ParthSareen authored
-
ParthSareen authored
-
Bruce MacDonald authored
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
-
Shane-XB-Qian authored
Signed-off-by:shane.xb.qian <shane.qian@foxmail.com>
-
shane.xb.qian authored
Signed-off-by:shane.xb.qian <shane.qian@foxmail.com>
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
Michael authored
-
- 11 Mar, 2025 10 commits
-
-
Michael Yang authored
engine: add gemma support
-
jmorganca authored
-
jmorganca authored
-
jmorganca authored
-
jmorganca authored
-
Michael Yang authored
-
Daniel Hiltgen authored
-
jmorganca authored
-
jmorganca authored
-
jmorganca authored
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
-