- 17 Mar, 2025 1 commit
-
-
zeo authored
-
- 15 Mar, 2025 3 commits
-
-
Patrick Devine authored
This fixes the case where a FROM line in previous modelfile points to a file which may/may not be present in a different ollama instance. We shouldn't be relying on the filename though and instead just check if the FROM line was instead a valid model name and point to that instead.
-
Blake Mizerany authored
This sets the agent header in DefaultRegistry to include the version of the client, OS, and architecture in the previous format, with a minor twist. Note: The version is obtained from the build info, instead of the version in version.Version, which should not longer be necessary, but we can remove in a future commit. Using the build info is more accurate and also provides extra build information if the build is not tagged, and if it is "dirty". Previously, the version was just "0.0.0" with no other helpful information. The ollama.com registry and others handle this swimmingly.
-
Patrick Devine authored
-
- 14 Mar, 2025 7 commits
-
-
Daniel Hiltgen authored
Darwin was using a different pattern for the version string than linux or windows.
-
Jesse Gross authored
Previously processing multiple images in a batch would trigger segfaults so sending images together was disabled as a way to mitigate this. The trigger was processing one image on the CPU and one on the GPU. This can no longer happen: - The vision encoder is now on the GPU so both images would be processed on the GPU. - We require images to be fully contained in a batch and each image including its special tokens is over half the batch size. As a result, we will never get two images in the same batch. Fixes #9731
-
Jesse Gross authored
Currently there is a single context per sequence, shared all by all multimodal inputs. Since we build a vision encoder graph per image, with a large number of inputs we can eventually hit the maximum number of graph nodes per context. This changes to use a separate context for each image, ensuring that available resource limits are consistent.
-
Jesse Gross authored
Models may require that a set of inputs all be processed as part of the same batch. For example, if an image has multiple patches with fully connected attention between them, we should not split the batch in the middle of an image. Fixes #9697
-
Bruce MacDonald authored
This commit refactors the LLM subsystem by removing internal subprocess request and response types. It consolidates duplicate type definitions across the codebase, moving them to centralized locations. The change also standardizes interfaces between components, simplifies the ServerStatusResp struct, and moves the ParseDurationMs function to a common package. This cleanup reduces code duplication between different runner implementations (llamarunner and ollamarunner).
-
Blake Mizerany authored
-
Blake Mizerany authored
Replace large-chunk blob downloads with parallel small-chunk verification to solve timeout and performance issues. Registry users experienced progressively slowing download speeds as large-chunk transfers aged, often timing out completely. The previous approach downloaded blobs in a few large chunks but required a separate, single-threaded pass to read the entire blob back from disk for verification after download completion. This change uses the new chunksums API to fetch many smaller chunk+digest pairs, allowing concurrent downloads and immediate verification as each chunk arrives. Chunks are written directly to their final positions, eliminating the entire separate verification pass. The result is more reliable downloads that maintain speed throughout the transfer process and significantly faster overall completion, especially over unstable connections or with large blobs.
-
- 13 Mar, 2025 17 commits
-
-
Michael Yang authored
count gemma3 vision tensors
-
Michael Yang authored
-
Bradley Erickson authored
-
Michael Yang authored
-
Michael Yang authored
the largest operation is by far (q @ k) so just count that for simplicity
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
fix: error if image requested without vision model
-
Patrick Devine authored
Add metadata and tensor information to the show command to be able to see more information about a model. This outputs the same data as shown on the model details page on ollama.com
-
Patrick Devine authored
-
Michael Yang authored
fix: error on models that don't support embeddings
-
Michael Yang authored
Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
ollama-debug.c: correct mistype
-
Parth Sareen authored
-
shane.xb.qian authored
* macOS has different definition per info from @mxyng
-
- 12 Mar, 2025 8 commits
-
-
ParthSareen authored
-
ParthSareen authored
-
ParthSareen authored
-
Bruce MacDonald authored
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
-
Shane-XB-Qian authored
Signed-off-by:shane.xb.qian <shane.qian@foxmail.com>
-
shane.xb.qian authored
Signed-off-by:shane.xb.qian <shane.qian@foxmail.com>
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
Michael authored
-
- 11 Mar, 2025 4 commits
-
-
Michael Yang authored
engine: add gemma support
-
jmorganca authored
-
jmorganca authored
-
jmorganca authored
-