- 25 Jul, 2025 1 commit
-
-
Ruyut authored
-
- 24 Jul, 2025 2 commits
-
-
Patrick Devine authored
-
Jeffrey Morgan authored
-
- 23 Jul, 2025 2 commits
-
-
minxinyi authored
-
Michael Yang authored
-
- 22 Jul, 2025 2 commits
-
-
Patrick Devine authored
--------- Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
ycomiti authored
-
- 20 Jul, 2025 2 commits
-
-
Stefan Wärting authored
-
Jeffrey Morgan authored
Co-authored-by:frob <rick+github@frob.com.au>
-
- 19 Jul, 2025 1 commit
-
-
zmldndx authored
-
- 17 Jul, 2025 5 commits
-
-
Daniel Hiltgen authored
The macos-13 is x86, while macos-13-xlarge is arm64
-
frob authored
-
frob authored
Co-authored-by:Richard Lyons <frob@cloudstaff.com>
-
Haiyue Wang authored
-
Michael Yang authored
-
- 16 Jul, 2025 3 commits
-
-
Parth Sareen authored
-
Bruce MacDonald authored
StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.
-
Marcelo Fornet authored
-
- 11 Jul, 2025 4 commits
-
-
Jesse Gross authored
Reporting params.NumGPULayers can be misleading because it is the requested number of layers, not the actual number that is loaded. While they are often the same, there are cases where they might mismatch, such as if the GPU backend is missing.
-
Jesse Gross authored
We're not currently using it, even in cases where we could. Disabling it improves generation performance by 10-30% with multiple GPUs.
-
Daniel Hiltgen authored
* Only load supported models on new engine Verify the model is supported before trying to load * int: testcase for all library models
- 09 Jul, 2025 1 commit
-
-
Jesse Gross authored
We don't get valid UUIDs for AMD GPUs on Windows, so the best option is to use the ordinal IDs. This brings us in line with what we currently do on the Ollama server - the only exception is AMD GPUs on Linux, which falls back to using ordinal IDs. The GGML implementation has no fallback but it doesn't appear to occur for any of the GPUs that we support. It's also possible that there are collisions between ordinal IDs for different libraries - however the only places where we use them are AMD on Windows and Metal on Mac, which can never occur on the same system.
-
- 08 Jul, 2025 3 commits
-
-
Daniel Hiltgen authored
also removes stale model dir instructions for windows
-
Daniel Hiltgen authored
The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.
-
Daniel Hiltgen authored
* API: expose context size of loaded models * CLI: add context UX This adds a column in the ps output to show the models context size.
-
- 07 Jul, 2025 4 commits
-
-
Parth Sareen authored
-
Parth Sareen authored
-
Daniel Hiltgen authored
switch a few constants to variables
-
Jesse Gross authored
The root cause was an unclean upgrade - this code is fine. This reverts commit 45f216a9.
-
- 06 Jul, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 05 Jul, 2025 3 commits
-
-
Daniel Hiltgen authored
usage example: go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
-
Daniel Hiltgen authored
-
Vincent RAMPAL authored
-
- 03 Jul, 2025 1 commit
-
-
Daniel Hiltgen authored
Favor the dmg now.
-
- 02 Jul, 2025 1 commit
-
-
Daniel Hiltgen authored
This adds some extra logs to make the new engine a bit more consistent with the llama engine.
-
- 01 Jul, 2025 1 commit
-
-
XuKecheng authored
-
- 30 Jun, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 29 Jun, 2025 1 commit
-
-
Attogram Project authored
-
- 27 Jun, 2025 1 commit
-
-
Michael Yang authored
-